On 04/02/2011, at 18:58, Jonas Sicking wrote: > On Fri, Feb 4, 2011 at 8:37 AM, Jorge <[email protected]> wrote: >> Hi, >> >> Wrt to the note "some base64 encoders add newlines or other whitespace to >> their output. atob() throws an exception if its input contains characters >> other than +/=0-9A-Za-z, so other characters need to be removed before >> atob() is used for decoding" in http://aryeh.name/spec/base64.html , I think >> that in the end it's better to ignore any other chars instead of throwing, >> because skipping over any such chars while decoding is cheaper and requires >> less memory than scanning the input twice, first to clean it and second to >> decode it, something you'd not want to end up doing -just in case- everytime. >> >> Say, for example, that you've got a 4MB base64 with (perhaps?) some >> whitespace, in order to clean it up you're going to have to have it in >> memory along the cleaned up version at least while constructing the clean >> version, but if atob() skipped over anything other than +/=0-9A-Za-z you >> could just pass it directly, and the whole process would be even faster too, >> given there was no need to clean it up first. FWIW, that's how nodejs is >> doing it right now. > > Not sure I follow you. Why not simply measure the length of the string > (most implementations keep that around for fast access), and > optimistically allocate enough memory to hold the expected result. > Then start converting. As you're converting, if you find an > unrecognized character, just free the allocated memory and throw an > exception. > > No need to scan twice.
I was thinking about this: var result= atob( base64_inputStr.replace(/\s/g, '') ); The first scan happening in .replace(), the second in atob(). The intermediate value stays in memory (at least for a little while) along with base64_inputStr. -- Jorge.
