Re: [websec] [decade] Digest: Adventures in encoding

Martin J. Dürst Fri, 07 Oct 2011 04:23:18 -0700

Hello Stephen,

On 2011/10/07 17:54, Stephen Farrell wrote:


Hi Phill,

Oauth [1] uses ""application/x-www-form-urlencoded" format as defined by
[W3C.REC-html401-19991224]" all over the place to solve basically
this problem but in the context of HTTP URLs which has to be worse
than for a new URI scheme.

Why not do the same here?

Are you thinking about this for escaping characters such as slashes? Ordo you think we should take the binary hash, look at it as a series of8-bit bytes, and escape each of these bytes with %-encoding?

For the former, we already have the filesystem-safe version (passing aslash with %-encoding to a decent server might allow the server to usethe slash *in* the filename, but that would confuse users and a wholelot of other software.

For the later, this would essentially be base16 with a lot of '%'characters mixed in.

As for Phil's original ideas, if there's a security issue withdistinguishing upper- and lowercase filenames on Windows, thenabandoning base64 may be a good idea. I wouldn't know of and couldn'timagine a structural security issue, but then I'm no security expert,but there's certainly some erosion, up to one bit per 6 bits if all ofthe characters are alphabetic.

As for Base32, I don't like it too much because it seems to be totallynew, and I prefer using existing stuff if it's good enough. This mayshow up in code. In scripting languages (think Perl, Ruby,...), Base64and Base16 are simple pack operations, and Base64url is a pack and a tr,but Base32 needs handcoding. Also, Base32 aligns 8 characters to 5bytes, Base64 4 characters to 3 bytes, and Base16 2 characters to 1byte, and I somehow prefer a more regular alignment (although the reasonfor this may be that I never completely figured out how to handle thenon-aligned stuff at the end).

As for the Base32 with slashes, why not just leave the choice of havingslashes or not to the creator, and require the consumer to take them out(as Phil proposed anyway). But with the "slashes being taken out" Idon't understand Phil's argument about direct mapping to file names.

On the other hand, I think allowing both Base32s and Base64url is anon-starter. Why have two if one is good enough? And then you can't mapto URI paths because somebody could transform the digest from oneversion to another to squeeze it.

Also, I don't understand Phil's length issues (Base32 is almost as goodas Base64, so it's good, but Base16 is too long); he seems to have aspecific upper length in mind for a specific use case, which I think isalways a bad idea. (Others may have other limits that are a bit shorteror a bit longer.)

What I still don't understand is where the pressure for having to havethe digest and part of the URI path needing to be the same comes from.If I want to have a digest protection for a page directly athttp://www.example.org/, shouldn't I be able to do so? I tried to askabout this previously, but I still didn't get the point, sorry.


Regards,   Martin.

S.

[1] http://tools.ietf.org/html/draft-ietf-oauth-v2-22#section-4.1.1

On 10/07/2011 03:49 AM, Phillip Hallam-Baker wrote:

Following on the base16/base64 discussion, I have written some code (see
end) and have some ni digests in various flavors of encoding.

My conclusion is that we should split the difference and do base32
instead.
I think the arguments are actually quite compelling.

This is only an encoding issue. So choosing an encoding that requires the
least number of systems to be touched is my priority. Choosing base32
allows
the resolution scheme to be supported by unmodified Apache and IIS.

The additional code burden for ni/digest implementers to write base32 is
trivial


There is also the option of doing more than one encoding. But Base64 uris
are only slightly shorter.


*Base16*
ni:sha-256;B77B635B2832BF95E8F2935963F134A41F4F11C0BEDD6CED2C5E551F288D9980


Problem - very long even without separators.


*Base32:*
ni:sha-256;W5AGGABIGKAJLAHSSNAGHABUUQAE6AGAX3AGZABMLZAB6AENTGAA

Advantage: more compact than Base16 (somewhat), can be read out over a
telephone or terminal room (try that with Base64). Can be printed as a
static reference in a journal or equivalent.


*Base32s:*
ni:sha-256;W5AGGA-BIGKAJ-LAHSSNA-GHABUU-QAE6AGA-X3AGZA-BMLZAB-6AENTGAA

This is my own invention, basically Base32 with separators added for
readability.


*Base64:*
ni:sha-256;t3tjWygyv5Xo8pNZY/E0pB9PEcC+3WztLF5VHyiNmYA=

This is the traditional base64 encoding.

Process disadvantage: You know that someone is bound to challenge the
forward slash just as the document gets to last call. I don't see an
advantage to risk a discuss.

Practical disadvantage: Gets messed up when converted to a well-known
URL.
Consider the following:

ni:sha-256;t3tjWygyv5Xo8pNZY/E0pB9PEcC+3WztLF5VHyiNmYA=?http=example.com

This would map to:

http://example.com/.well-known/ni/sha-256/t3tjWygyv5Xo8pNZY/E0pB9PEcC+3WztLF5VHyiNmYA=


The forward slash is not a hierarchy indicator which is bad. Worse still
this would mean that support for ni digest objects requires a code
plug in
for Apache, IIS etc rather than just mapping .well-known/ni/sha-256 to
the
directory with the digest values.


*Base64url:*

ni:sha-256;t3tjWygyv5Xo8pNZY_E0pB9PEcC-3WztLF5VHyiNmYA

This is essentially the same as base64 for size etc. The only
disadvantage
being that the encoder has to be scratch written. (Mine took 20 mins).

The advantage over plain base64 is that there is no code required to
support
the .well-known version of the locator scheme on the server at all. Just
some admin stuff. Also the URL is completely compatible with URI process
lore.


*Summary:*

Arguments can be made for each one of these schemes. I think the argument
for Base16 is the weakest since Base32 can do everything that Base16
can do.
Neither is implemented as a standard library function on my platform.

If we went for Base32, I would argue for allowing some form of
readability
separator. Base32s is much easier to transcribe than plain Base32.

Base64/Base64url offer the best compression in a practical form. For most
applications a truncated digest will be acceptable. The main
disadvantage of
the Base64 schemes is that they are case sensitive. This will play merry
heck with case insensitive but case preserving file systems such as
Windows.

I think we should knock out base16 from consideration as base32 does it
better.

Since it is fairly easy to write a filter that strips out the
separators, I
would argue for allowing separators at any point in the identifier but
that
the canonical form is with the separators stripped out and this is
what is
used to create the URL form. So

ni:sha-256;W5AGGA-BIGKAJ-LAHSSNA-GHABUU-QAE6AGA-X3AGZA-BMLZAB-6AENTGAA

would map to

http://example.com/.well-known/ni/sha-256/W5AGGABIGKAJLAHSSNAGHABUUQAE6AGAX3AGZABMLZAB6AENTGAA


This preserves the criteria that Apache, IIS etc can be configured to
resolve these identifiers without new code.


Taking out Base 16 and base32 without separators, I see the following
options:

Base32s only
Base64 only
Base64url only
Base32s + Base64url

I can see pros and cons for the base32 encoding. To make it really
readable
it is necessary to put in the separators which is something of an
overhead.
So I can see an argument for both.

But if we only pick one I would say take base32 with separators. It is
not
the most compact but it is good enough. It is fully URL compatible and
has
the readability benefit.


Here is Base32 in C#:

string ToBase32String(byte[] data, int Length) {
string result = "";
int offset = 0;
int a = 0;

for (int i = 0; i< Length; i++) {
a = (a<< 8) | data[i];
offset += 8;

while (offset>= 5) {
offset -= 5;

int n = a>> offset;
result = result + BASE32[n];
a = a& (0x1f>> (5 - offset));
}
}

if (offset> 0) {
result = result + BASE32[a];
}
return result;
}

Here is base64url in C#:

string ToBase64urlString(byte[] data, int Length) {
string result = "";
int offset = 0;
int a = 0;

for (int i = 0; i< Length; i++) {
a = (a<< 8) | data[i];
offset += 8;

//Console.WriteLine ("{0:x4}/{2:3} : {1}", a, result,
offset);

while (offset>= 6) {
offset -= 6;

int n = a>> offset;
result = result + BASE64URL[n];
a = a& (0x3f>> (6 - offset));
}
}
if (offset> 0) {
result = result + BASE64URL[a];
}
return result;
}



_______________________________________________
decade mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/decade

_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec

_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec

Re: [websec] [decade] Digest: Adventures in encoding

Reply via email to