On 04/02/2011, at 18:58, Jonas Sicking wrote:
> On Fri, Feb 4, 2011 at 8:37 AM, Jorge <[email protected]> wrote:
>> Hi,
>> 
>> Wrt to the note "some base64 encoders add newlines or other whitespace to 
>> their output. atob() throws an exception if its input contains characters 
>> other than +/=0-9A-Za-z, so other characters need to be removed before 
>> atob() is used for decoding" in http://aryeh.name/spec/base64.html , I think 
>> that in the end it's better to ignore any other chars instead of throwing, 
>> because skipping over any such chars while decoding is cheaper and requires 
>> less memory than scanning the input twice, first to clean it and second to 
>> decode it, something you'd not want to end up doing -just in case- everytime.
>> 
>> Say, for example, that you've got a 4MB base64 with (perhaps?) some 
>> whitespace, in order to clean it up you're going to have to have it in 
>> memory along the cleaned up version at least while constructing the clean 
>> version, but if atob() skipped over anything other than +/=0-9A-Za-z you 
>> could just pass it directly, and the whole process would be even faster too, 
>> given there was no need to clean it up first. FWIW, that's how nodejs is 
>> doing it right now.
> 
> Not sure I follow you. Why not simply measure the length of the string
> (most implementations keep that around for fast access), and
> optimistically allocate enough memory to hold the expected result.
> Then start converting. As you're converting, if you find an
> unrecognized character, just free the allocated memory and throw an
> exception.
> 
> No need to scan twice.

I was thinking about this:

var result= atob( base64_inputStr.replace(/\s/g, '') );

The first scan happening in .replace(), the second in atob(). The intermediate 
value stays in memory (at least for a little while) along with base64_inputStr.
-- 
Jorge.

Reply via email to