On Wed, Aug 15, 2012 at 5:30 PM, Glenn Maynard <[email protected]> wrote:
> On Tue, Aug 14, 2012 at 12:34 PM, Joshua Bell <[email protected]> wrote: > >> - Create an encoder with TextDecoder() and if present a BOM will be >> >> respected (and consumed) otherwise default to UTF-8 >> > > Let's not default to "autodetect Unicode formats". It encourages people > to support UTF-16 when they may not mean to. If BOM detection for both > UTF-8 and UTF-16 is wanted, I'd suggest something explicit, like "utf-*". > > If the argument to the ctor is optional, I think the default should be > purely UTF-8. > Works for me. In the algorithm specified in the email, this simply removes the clause "If encoding is not specified, set an internal useBOM flag" - namely, only "utf-16" gets the useBOM flag. I'll attempt to wedge this into the spec soon. > This gets easier if we restrict to encoding UTF-8 which typically doesn't >> include BOMs. But it's looking like there's enough desire to keep UTF-16 >> encoding at the moment. Agree with just stripping it for now. >> > > UTF-8 sometimes does have a BOM, especially in Windows where applications > sometimes use it to distinguish UTF-8 from ACP text files (which are just > as common as ever--Windows has made no motion away from legacy encodings > whatsoever). > Good point. Ah, Notepad, my old friend... > Stripping the BOM can cause those applications to misinterpret the files > as ACP. > > Anyway, even if the encoding API gives a "helper" for this, figuring out > how that works would probably be more effort for developers than just > peeking at the ArrayBuffer for the BOM and adding it back in manually. > (I'm pretty sure anybody who knows enough to pay attention to this in the > first place will have no trouble doing that.) So, yeah, let's not worry > about this. > > -- > Glenn Maynard > >
