Re: [whatwg] Specs for window.atob() and window.btoa()

Simon Pieters Fri, 13 May 2011 07:04:41 -0700

On Thu, 12 May 2011 00:13:37 +0200, Ian Hickson <[email protected]> wrote:

On Fri, 4 Feb 2011, Jorge wrote:


Wrt to the note "some base64 encoders add newlines or other whitespace
to their output. atob() throws an exception if its input contains
characters other than +/=0-9A-Za-z, so other characters need to be
removed before atob() is used for decoding" in
http://aryeh.name/spec/base64.html , I think that in the end it's better
to ignore any other chars instead of throwing, because skipping over any
such chars while decoding is cheaper and requires less memory than
scanning the input twice, first to clean it and second to decode it,
something you'd not want to end up doing -just in case- everytime.

Say, for example, that you've got a 4MB base64 with (perhaps?) some
whitespace, in order to clean it up you're going to have to have it in
memory along the cleaned up version at least while constructing the
clean version, but if atob() skipped over anything other than
+/=0-9A-Za-z you could just pass it directly, and the whole process
would be even faster too, given there was no need to clean it up first.
FWIW, that's how nodejs is doing it right now.

Also, some tools (e.g. the openssl decoder) *expect* the newlines to be
there, and fail if they aren't.


On Fri, 4 Feb 2011, Boris Zbarsky wrote:


The problem is that at least some current browsers (which ones?) throw.
So you wouldn't be able to rely on the non-throwing behavior anyway....


On Fri, 4 Feb 2011, Aryeh Gregor wrote:


Everyone except Opera throws on invalid characters in atob() input, and
IIRC, I was told by Opera devs that not throwing caused compat problems
for them.  So I don't think this is worth trying to change.


On Fri, 4 Feb 2011, Jorge wrote:


On the other hand, it will be so forever unless the spec says *not* to
throw but to skip over instead, so that in a few years the cleanup can
be ~safely skipped.


On Fri, 4 Feb 2011, Aryeh Gregor wrote:


Nope.  The spec isn't going to change browser behavior here if there are
sites that depend on the current behavior -- and reportedly there are.
There's just no incentive for browsers to change; the proposed behavior
isn't sufficiently superior to warrant even slight compatibility pain.
We can change web APIs in ways that might cause some compatibility pain
if we have good reason, but for really minor things like this it's just
not worth it.  Browsers can only afford to break a certain number of
websites per release before users start to get annoyed, and we shouldn't
be wasting it on things like this.


On Sat, 5 Feb 2011, Jorge wrote:


How is this :

try {
  var result= atob(input); // will throw if input has whitespace
}
catch (e) {
  try {

var result= atob( input.replace(/\s/g, '') ); // will throw ifinput is not proper base64

  }
  catch (e) {
    throw e;
  }
}

any better than :

var result= atob(input); // will throw if input is not proper base64

?


On Sat, 5 Feb 2011, Simon Pieters wrote:


Is the compat problem for not throwing for whitespace or for not
throwing for other garbage? If it's for other garbage, we could allow
whitespace but throw for other garbage. (The bugs I can find in our
database with a quick search is about non-ASCII characters not
throwing.)

Better performance seems like an incentive.


On Sat, 5 Feb 2011, Aryeh Gregor wrote:


Opera people were the only ones who told me about these compat problems,
so it could be just non-ASCII characters.  I went with Gecko's behavior
exactly because it seemed simpler than WebKit's and I had been told
Opera's wasn't fully web-compatible.  Both Gecko and WebKit do throw on
any whitespace.


On Sat, 5 Feb 2011, Jonas Sicking wrote:


As a firefox developer, I'd be interested in avoiding throwing if it can
make things easier for authors (and it is web compatible).

So my first question is, can someone give examples of sources of base64
data which contains whitespace?

I agree that this function probably doesn't appear in a lot of
performance critical code paths. However it might show up in places
which deal with large bodies of data, so if people can avoid cloning
that data unnecessarily then that's a win.


On Sat, 5 Feb 2011, Joshua Cranmer wrote:


The best guess I have is base64-encoding MIME parts, which would be
hardwrapped every 70-80 characters or so.


On Sat, 5 Feb 2011, Joshua Bell wrote:


RFC 3548 "The Base16, Base32, and Base64 Data Encodings" Section 2.1
discusses line feeds in encoded data, calling out the MIME line length
limit. For example, Perl's MIME::Base64 has an encode_base64() API that
by default inserts newlines after 76 characters. (An optional argument
allows this behavior to be overridden.)

Section 2.3 discusses "Interpretation of non-alphabet characters in
encoded data" specifically in base64 (etc) encoded data.


On Sun, 6 Feb 2011, Jorge wrote:


$ openssl enc -base64 ... inserts newlines too.


The argument for changing this seems somewhat compelling, if browsers are
willing to change, especially just for the whitespace case. My

recommendation for people who care about this is to get browser vendorsto

make this change and see if it causes compatibility problems. If it

doesn't, we can update the spec. Please feel free to cc me on therelevantbugs if you would like my help in convincing browser vendors to trythis. :-)

We're making this change in Opera (we'll ignore "space characters"http://www.whatwg.org/specs/web-apps/current-work/complete/common-microsyntaxes.html#space-characterin atob).


--
Simon Pieters
Opera Software

Re: [whatwg] Specs for window.atob() and window.btoa()

Reply via email to