Re: [OAUTH-WG] Understanding the reasoning for Base64

Naitik Shah Thu, 01 Jul 2010 13:01:39 -0700

Searching for base64url does make it better. Thanks for that pointer Dick.

We don't think base64url will work, because the most common error we'll see
is that developers forget the "url" part and just do plain base64, and
that's not sufficient because the stock set includes +.


So it will maybe work, maybe not. Maybe they'll do urlencoding after
anyways, since if they are passing this as a query param, or post data,
client libraries will take a dict and try to "do the right thing". And we
end up with pluses, and we're not quite sure if they should be urldecoded or
not.

For example, this doesn't give me back what I put in:

  base64_decode(urldecode(base64_encode('-+>')))

Because the base64_encode returns something containing a +, and the usual
urldecode logic will convert this into a space. In fact, looking at the PHP
docs for base64 decode, there are a bunch of comments talking about pluses:
http://php.net/manual/en/function.base64-decode.php.

The approach I'm thinking is something like:

  function getSignedToken($params, $secret) {
    $json = json_encode($params);
    // Moving the signature to be before the payload makes the
separator/split
    // logic be: "everything before the first dot is the sig". also, it's
hex.
    return hash_hmac('sha256', $json, $secret) . '.' . $json;
  }

  function parseSignedToken($token, $secret) {
    list($sig, $json) = explode('.', $token, $limit=2);
    if (hash_hmac('sha256', $json, $secret) !== $sig) {
      throw new Exception('Bad Sig');
    }
    return json_decode($json);
  }

[UNTESTED!] Where url encoding and decoding happens outside of the signature
process.

I'm not convinced adding the request as a header is going to be used often.
Headers are always sent uncompressed. I imagine signatures to be used for
those that are perf sensitive and don't want the SSL overhead. So this minor
detail will matter to them. This is especially true because unlike OAuth1,
the header doesn't just contain a signature, but contains the entire signed
payload. But there's nothing stopping us from using the same format above
for sending the payload as a header.

One of the arguments was wrt detecting double url encoding. Since the thing
we're signing is a JSON object, we always know it will start with %7B and if
it's double encoded, it will be %257B.


-Naitik

On Fri, Jun 25, 2010 at 5:21 PM, Dick Hardt <[email protected]> wrote:

> The RFC term is base64url which turns up much better results when
> searching. "URL safe base64" is also a good search term.
>
> Note that the token may also be included in the HTTP header. base64url
> encoding works well for HTTP headers. Note that the token is opaque to the
> client, so being plain text is not as useful as it may seem. Unlike URL
> encoding, it is pretty easy to see if a string is base64url or not. With URL
> encoding, it may or may not be encoded which sometimes leads to double
> encoding / mismatched encoding.
>
> -- Dick
>
> On 2010-06-25, at 4:42 PM, Naitik Shah wrote:
>
> So my litmus test was looking on the web for "web base 64" or "web base64".
> Both yield nothing useful. Looking at the docs for PHP, it doesn't seem to
> support it, Python does, Ruby doesn't seem to. Java doesn't seem to have a
> native base64, and the C# one doesn't seem to have the web version (a bit
> unsure about these two). The point I'm trying to make is to someone who
> comes and reads the docs for an API where we're talking about signature, web
> base64 will need to be explained in detail.
>
> I agree eliminating encoding problems is a good goal. But if we do have to
> teach the developer a new, or slightly modified encoding scheme, why not
> have it be URL encoding? It seems more valuable in the long term, since it's
> a spec that shows up more frequently on the web. And OAuth 1 has already
> taught this spec to at least some of the target audience.
>
> While encoding/decoding with any format will result in some learning, I
> think if we sign a JSON blob sans urlencoding or web-base64, it might work
> out to be easier. The fact that it's still mostly plain text is also a plus
> imho.
>
>
> -Naitik
>
>
> On Fri, Jun 25, 2010 at 1:36 PM, John Panzer <[email protected]> wrote:
>
>> There are 2 characters that are different between base64 and base64url.
>>  Many good libraries support both (as they're both useful, and both are in
>> the base64 RFC spec); the ability to eliminate a class of encoding problems
>> seems like a good trade-off for, in some languages without full base64
>> support, an additional substitution of 2 characters.
>>
>> --
>> John Panzer / Google
>> [email protected] / abstractioneer.org <http://www.abstractioneer.org/>/ 
>> @jpanzer
>>
>>
>>
>> On Fri, Jun 25, 2010 at 12:15 PM, Naitik Shah <[email protected]> wrote:
>>
>>> On Fri, Jun 25, 2010 at 11:39 AM, Breno <[email protected]>wrote:
>>>
>>>> On Fri, Jun 25, 2010 at 10:49 AM, Luke Shepard <[email protected]>
>>>> wrote:
>>>> > Brian, Dirk - just wondering if you had thoughts here?
>>>> >
>>>> > The only strong reason I can think of for base64 encoding is that it
>>>> allows for a delimiter between the body and the signature. Is there any
>>>> other reason?
>>>>
>>>> Without base64 encoding we have to define canonicalization procedures
>>>> around spaces and we still have to URL encode separator characters
>>>> such as {. There is also the risk that developers might be confused
>>>> whether the URL encoding is to be performed before or after
>>>> computation of the signature.  If you say that the signature is
>>>> computed on the base64 encoded blob, there's less scope for confusion
>>>> and interoperability issues.
>>>>
>>>>
>>> Yep, I get that the "web" version makes the url encoding a no op. But I
>>> fear we're trading one spec (urlencoding) to another one (web base64). I'm
>>> imagining the sample code (that does not rely on an SDK) we'ed give out to
>>> developers in our docs, and the thing that stands out is the "web" part in
>>> the web_base64. It means that our sample code will look like
>>>
>>>   str_replace("+", "_", base64(json_encode(data))))
>>>
>>> or for validating signatures:
>>>
>>>   json_decode(decode64(str_replace("_", "+", data)))
>>>
>>> The str_replace() really stands out. From my quick read, it seemed like
>>> there were one or two other characters that needed to get replaced too.
>>> While some languages (like PHP) support arrays to specify multiple
>>> replacement patterns, in other languages you'll end up with a few
>>> str_replace calls. It would be nice if that wasn't necessary.
>>>
>>> I'm wondering if we can get away with "urlencode(json_encode(data) + '.'
>>> + sig)" as the value. then, instead of str_replace for getting normal base64
>>> logic to work, we would instead need a rsplit or something, since the dot is
>>> not a reserved character in the json blob. Was that approach considered?
>>>
>>>
>>> -Naitik
>>>
>>> _______________________________________________
>>> OAuth mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/listinfo/oauth
>>>
>>>
>>
> _______________________________________________
> OAuth mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/oauth
>
>
>

_______________________________________________
OAuth mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/oauth

Re: [OAUTH-WG] Understanding the reasoning for Base64

Reply via email to