Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 8:19 PM, Glenn Maynard wrote: > Using Views instead of specifying the offset and length sounds good. > > On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson wrote: > > > - What's the use case for supporting anything but UTF-8? > > > > Other Unicode encodings may be useful, to decode existing file formats > containing (most likely at a minimum) UTF-16. I don't feel strongly about > that, though; we're stuck with UTF-16 as an internal representation in the > platform, but that doesn't necessarily mean we need to support it as a > transfer encoding. > > For non-Unicode legacy encodings, I think that even if use cases exist, > they should be given more than the usual amount of scrutiny before being > supported. > The whole idea is to be able to extract textual data out of some packed binary format. If you don't support the character sets people want to use, they will simply do like they have to do now and hand-code the character set conversion, where it will slow and inaccurate. In particular, I think you have to include various ISO-8859-* character sets (especially Latin1) and the non-Unicode character sets still frequently used by Japanese and Chinese users. I am fine with strongly suggesting that only UTF8 be used for new things, but leaving out legacy support will severely limit the utility of this library. -- John A. Tamplin Software Engineer (GWT), Google
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
Using Views instead of specifying the offset and length sounds good. On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson wrote: > - What's the use case for supporting anything but UTF-8? > Other Unicode encodings may be useful, to decode existing file formats containing (most likely at a minimum) UTF-16. I don't feel strongly about that, though; we're stuck with UTF-16 as an internal representation in the platform, but that doesn't necessarily mean we need to support it as a transfer encoding. For non-Unicode legacy encodings, I think that even if use cases exist, they should be given more than the usual amount of scrutiny before being supported. On Tue, Mar 13, 2012 at 6:38 PM, Tab Atkins Jr. wrote: > Python throws errors by default, but both functions have an additional > argument specifying an alternate strategy. In particular, > bytes.decode can either drop the invalid bytes, replace them with a > replacement char (which I agree should be U+FFFD), or replace them > with XML entities; str.encode can choose to drop characters the > encoding doesn't support. > Supporting throwing is okay if it's really wanted, but the default should be replacement. It reduces fatal errors to (usually) non-fatal replacement, for obscure cases that people generally don't test. It's a much more sane default failure mode. As another option, never throw, but allow returning the number of conversion errors: results = encode("abc\uD800def", outputView, "UTF-8"); where results.inputConsumed is the number of words consumed in myString, results.outputWritten is the number of UTF-8 bytes written, and results.errors is 1. That also allows block-by-block conversion; for example, to convert as many complete characters as possible into a fixed-size buffer for transmission, then starting again at the next unencoded character. One more idea, while I'm brainstorming: if outputView is null, allocate an ArrayBuffer of the necessary size, storing it in results.output. That eliminates the need for a separate length pass, without bloating the API with another overload. On Tue, Mar 13, 2012 at 6:50 PM, Joshua Bell wrote: > (Cue a strong "nooo!" from Anne.) > (Count me in on that, too. Heuristics bad.) Ignoring the issue of invalid code points, the length calculations for > non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not > be sanitized, that case is trivially 2x the JS string length.) > UTF-16 "sanitization" (replacing mismatched surrogates with U+FFFD) doesn't change the size of the output, actually. -- Glenn Maynard
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, 13 Mar 2012, Joshua Bell wrote: > > For both of the above: initially suggested use cases included parsing > data as esoteric as ID3 tags in MP3 files, where encoding unspecified > and is guessed at by decoders, and includes non-Unicode encodings. It > was suggested that the encoding sniffing capabilities of browsers be > leveraged. [...] > > Whether we should restrict it as far as UTF-8 depends on whether we > envision this API only used for parsing/serializing newly defined data > formats, or whether there is consideration for interop with previously > existing formats data formats and code. Seems reasonable. If we have specific use cases for non-UTF-8 encodings, I agree we should support them; if that's the case, we should survey those use cases to work out what the set of encodings we need is, and add just those. > > - Having a mechanism that lets you encode the string and get a length > > separate from the mechanism that lets you encode the string and get the > > encoded string seems like it would encourage very inefficient code. Can > > we instead have a mechanism that returns both at once? Or is the idea > > that for some encodings getting the encoded length is much quicker than > > getting the actual string? > > > > The use case was to compute the size necessary to allocate a single buffer > into which may be encoded multiple strings and other data, rather than > allocating multiple small buffers and then copying strings into a larger > buffer. > > Ignoring the issue of invalid code points, the length calculations for > non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not > be sanitized, that case is trivially 2x the JS string length.) Yeah, but surely we'll mainly be doing stuff with UTF-8... One option is to return an opaque object of the form: interface EncodedString { readonly attributes unsigned long length; // internally has a copy of the encoded string } ...and then have view.setString take this EncodedString object. At least then you get it down to an extraneous copy, rather than an extraneous encode. Still not ideal though. > > - Seems weird that integers and strings would have such different APIs > > for doing the same thing. Why can't we handle them equivalently? As in: > > > > len = view.setString(strings[i], > > offset + Uint32Array.BYTES_PER_ELEMENT, > > "UTF-8"); > > view.setUint32(offset, len); > > offset += Uint32Array.BYTES_PER_ELEMENT + len; > > Heh, that's where the discussion started, actually. We wanted to keep > the DataView interface simple, and potentially support encoding into > plain JS arrays and/or non-TypedArray support that appeared to be on the > horizon for JS. I see where you're coming from, but I think we should look at the platform as a whole, not just one API. It doesn't help the platform as a whole if we just have the same features split across two interfaces, the complexity is even slightly higher than just having one consistent API that does ints and strings equivalently. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:28 PM, Ian Hickson wrote: > On Tue, 13 Mar 2012, Joshua Bell wrote: > > On Tue, Mar 13, 2012 at 4:10 PM, Jonas Sicking wrote: > > > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell > > > wrote: > > > > Joshua Bell has been working on a string encoding and decoding API > > > > that supports the needed encodings, and which is separable from the > > > > core typed array API: > > > > > > > > http://wiki.whatwg.org/wiki/StringEncoding > > > > > > > > This is the direction I prefer. String encoding and decoding seems > > > > to be a complex enough problem that it should be expressed > > > > separately from the typed array spec itself. > > Some quick feedback: > > - [OmitConstructor] doesn't seem to be WebIDL > Historically, the spec started off as an addition to the Typed Array spec that splintered off; cleanup is definitely needed, thanks. > - please don't allow UAs to implement other encodings. You should list > the exact set of supported encodings and the exact labels that should > be recognised as meaning those encodings, and disallow all others. > Otherwise, we'll be in a never-ending game of reverse-engineering each > others' lists of supported encodings and it'll keep growing. > > - What's the use case for supporting anything but UTF-8? > For both of the above: initially suggested use cases included parsing data as esoteric as ID3 tags in MP3 files, where encoding unspecified and is guessed at by decoders, and includes non-Unicode encodings. It was suggested that the encoding sniffing capabilities of browsers be leveraged. (Cue a strong "nooo!" from Anne.) I completely agree that we should explicitly list the set of encoding supported and should remove the "other encodings" allowance. Whether we should restrict it as far as UTF-8 depends on whether we envision this API only used for parsing/serializing newly defined data formats, or whether there is consideration for interop with previously existing formats data formats and code. For example, "BINARY" would be used to bridge the existing atob()/btoa() methods with Typed Arrays (although base64 directly in/out of Typed Arrays would be preferable). Jonas, since you started this thread - did your content authors mention encodings? > - Having a mechanism that lets you encode the string and get a length > separate from the mechanism that lets you encode the string and get the > encoded string seems like it would encourage very inefficient code. Can > we instead have a mechanism that returns both at once? Or is the idea > that for some encodings getting the encoded length is much quicker than > getting the actual string? > The use case was to compute the size necessary to allocate a single buffer into which may be encoded multiple strings and other data, rather than allocating multiple small buffers and then copying strings into a larger buffer. Ignoring the issue of invalid code points, the length calculations for non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not be sanitized, that case is trivially 2x the JS string length.) > - Seems weird that integers and strings would have such different APIs > for doing the same thing. Why can't we handle them equivalently? As in: > > len = view.setString(strings[i], > offset + Uint32Array.BYTES_PER_ELEMENT, > "UTF-8"); > view.setUint32(offset, len); > offset += Uint32Array.BYTES_PER_ELEMENT + len; > Heh, that's where the discussion started, actually. We wanted to keep the DataView interface simple, and potentially support encoding into plain JS arrays and/or non-TypedArray support that appeared to be on the horizon for JS. > HTH, > -- > Ian Hickson U+1047E)\._.,--,'``.fL > http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' >
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard wrote: > The API on that wiki page is a reasonable start. For the same reasons that > we discussed in a recent thread ( > http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html), > conversion errors should use replacement (eg. U+FFFD), not throw > exceptions. Python throws errors by default, but both functions have an additional argument specifying an alternate strategy. In particular, bytes.decode can either drop the invalid bytes, replace them with a replacement char (which I agree should be U+FFFD), or replace them with XML entities; str.encode can choose to drop characters the encoding doesn't support. ~TJ
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:08 PM, Jonas Sicking wrote: > On Tue, Mar 13, 2012 at 3:58 PM, Tab Atkins Jr. wrote: >> On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking wrote: >>> Hi All, >>> >>> Something that has come up a couple of times with content authors >>> lately has been the desire to convert an ArrayBuffer (or part thereof) >>> into a decoded string. Similarly being able to encode a string into an >>> ArrayBuffer (or part thereof). >>> >>> Something as simple as >>> >>> DOMString decode(ArrayBufferView source, DOMString encoding); >>> ArrayBufferView encode(DOMString source, DOMString encoding, >>> [optional] ArrayBufferView destination); >>> >>> would go a very long way. The question is where to stick these >>> functions. Internationalization doesn't have a obvious object we can >>> hang functions off of (unlike, for example crypto), and the above >>> names are much too generic to turn into global functions. >>> >>> Ideas/opinions/bikesheds? >> >> Python3 just defines str.encode and bytes.decode. Can we not do this >> with String.encode and ArrayBuffer.decode? > > Unfortunately I suspect getting anything added on the String object > will take a few years given that it's too late to get into ES6 (and in > any case I suspect adding ArrayBuffer dependencies to ES6 would be > controversial). Like Ian said, I don't see anything particularly bad about the spec defining ArrayBuffers to define an ArrayBuffer-related method on String. There's no reason it has to be in the ES spec. ~TJ
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, 13 Mar 2012, Joshua Bell wrote: > > WHATWG makes sense, I just hadn't gotten around to shopping for a home. > (Administrivia: Is there need to propose a charter addition?) You're welcome to use the WHATWG list for this. Charters are pointless and there's no need to worry about them here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, 13 Mar 2012, Joshua Bell wrote: > On Tue, Mar 13, 2012 at 4:10 PM, Jonas Sicking wrote: > > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell > > wrote: > > > Joshua Bell has been working on a string encoding and decoding API > > > that supports the needed encodings, and which is separable from the > > > core typed array API: > > > > > > http://wiki.whatwg.org/wiki/StringEncoding > > > > > > This is the direction I prefer. String encoding and decoding seems > > > to be a complex enough problem that it should be expressed > > > separately from the typed array spec itself. Some quick feedback: - [OmitConstructor] doesn't seem to be WebIDL - please don't allow UAs to implement other encodings. You should list the exact set of supported encodings and the exact labels that should be recognised as meaning those encodings, and disallow all others. Otherwise, we'll be in a never-ending game of reverse-engineering each others' lists of supported encodings and it'll keep growing. - What's the use case for supporting anything but UTF-8? - Having a mechanism that lets you encode the string and get a length separate from the mechanism that lets you encode the string and get the encoded string seems like it would encourage very inefficient code. Can we instead have a mechanism that returns both at once? Or is the idea that for some encodings getting the encoded length is much quicker than getting the actual string? - Seems weird that integers and strings would have such different APIs for doing the same thing. Why can't we handle them equivalently? As in: len = view.setString(strings[i], offset + Uint32Array.BYTES_PER_ELEMENT, "UTF-8"); view.setUint32(offset, len); offset += Uint32Array.BYTES_PER_ELEMENT + len; HTH, -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard wrote: > On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking wrote: > > > Something that has come up a couple of times with content authors > > lately has been the desire to convert an ArrayBuffer (or part thereof) > > into a decoded string. Similarly being able to encode a string into an > > ArrayBuffer (or part thereof). > > > > There was discussion about this before: > > > https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html > http://wiki.whatwg.org/wiki/StringEncoding > > (I don't know why it was on the WebGL list; typed arrays are becoming > infrastructural and this doesn't seem like it belongs there, even though > ArrayBuffer was started there.) > Purely historical; early adopters of Typed Arrays were folks prototyping with WebGL who wanted to parse data files containing strings. WHATWG makes sense, I just hadn't gotten around to shopping for a home. (Administrivia: Is there need to propose a charter addition?) > The API on that wiki page is a reasonable start. For the same reasons that > we discussed in a recent thread ( > http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html), > conversion errors should use replacement (eg. U+FFFD), not throw > exceptions. The "any" arguments should be fixed. Encoding to UTF-16 > should definitely not prefix a BOM, and UTF-16 having unspecified > endianness is obviously bad. > > I'd also suggest that, unless there's serious, substantiated demand for > it--which I doubt--only major Unicode encodings be supported. Don't make > it easier for people to keep using legacy encodings. > > Two other pieces of feedback I received from Adam Barth off list: * take ArrayBufferView as input which both fixes "any" and simplifies the API to eliminate byteOffset and byteLength * support two versions of encode, one which takes a target ArrayBufferView, and one which allocates/returns a new Uint8Array of the appropriate length. > > Shouldn't this just be another ArrayBufferView type with special > > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a > > getString()/setString() method pair on DataView? > > I don't think so, because retrieving the N'th decoded/reencoded character > isn't a constant-time operation. > > -- > Glenn Maynard >
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard wrote: > On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking wrote: > > > Something that has come up a couple of times with content authors > > lately has been the desire to convert an ArrayBuffer (or part thereof) > > into a decoded string. Similarly being able to encode a string into an > > ArrayBuffer (or part thereof). > > > > There was discussion about this before: > > > https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html > http://wiki.whatwg.org/wiki/StringEncoding > > (I don't know why it was on the WebGL list; typed arrays are becoming > infrastructural and this doesn't seem like it belongs there, even though > ArrayBuffer was started there.) > > The API on that wiki page is a reasonable start. For the same reasons that > we discussed in a recent thread ( > http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html), > conversion errors should use replacement (eg. U+FFFD), not throw > exceptions. The "any" arguments should be fixed. Encoding to UTF-16 > should definitely not prefix a BOM, and UTF-16 having unspecified > endianness is obviously bad. > > I'd also suggest that, unless there's serious, substantiated demand for > it--which I doubt--only major Unicode encodings be supported. Don't make > it easier for people to keep using legacy encodings. > > Two other pieces of feedback I received from Adam Barth off list: * take ArrayBufferView as input which both fixes "any" and simplifies the API to eliminate byteOffset and byteLength * support two versions of encode, one which takes a target ArrayBufferView, and one which allocates/returns a new Uint8Array of the appropriate length. > > Shouldn't this just be another ArrayBufferView type with special > > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a > > getString()/setString() method pair on DataView? > > I don't think so, because retrieving the N'th decoded/reencoded character > isn't a constant-time operation. > > -- > Glenn Maynard >
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, 13 Mar 2012, Jonas Sicking wrote: > > Unfortunately I suspect getting anything added on the String object will > take a few years given that it's too late to get into ES6 (and in any > case I suspect adding ArrayBuffer dependencies to ES6 would be > controversial). We can just define it outside the ES spec. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 6:10 PM, Jonas Sicking wrote: > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell wrote: >> Joshua Bell has been working on a string encoding and decoding API >> that supports the needed encodings, and which is separable from the >> core typed array API: >> >> http://wiki.whatwg.org/wiki/StringEncoding >> >> This is the direction I prefer. String encoding and decoding seems to >> be a complex enough problem that it should be expressed separately >> from the typed array spec itself. > > Very cool. Where do I provide feedback to this? Here? This list seems like a good place to discuss it. -Ken
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking wrote: > Something that has come up a couple of times with content authors > lately has been the desire to convert an ArrayBuffer (or part thereof) > into a decoded string. Similarly being able to encode a string into an > ArrayBuffer (or part thereof). > There was discussion about this before: https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html http://wiki.whatwg.org/wiki/StringEncoding (I don't know why it was on the WebGL list; typed arrays are becoming infrastructural and this doesn't seem like it belongs there, even though ArrayBuffer was started there.) The API on that wiki page is a reasonable start. For the same reasons that we discussed in a recent thread ( http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html), conversion errors should use replacement (eg. U+FFFD), not throw exceptions. The "any" arguments should be fixed. Encoding to UTF-16 should definitely not prefix a BOM, and UTF-16 having unspecified endianness is obviously bad. I'd also suggest that, unless there's serious, substantiated demand for it--which I doubt--only major Unicode encodings be supported. Don't make it easier for people to keep using legacy encodings. > Shouldn't this just be another ArrayBufferView type with special > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a > getString()/setString() method pair on DataView? I don't think so, because retrieving the N'th decoded/reencoded character isn't a constant-time operation. -- Glenn Maynard
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell wrote: > Joshua Bell has been working on a string encoding and decoding API > that supports the needed encodings, and which is separable from the > core typed array API: > > http://wiki.whatwg.org/wiki/StringEncoding > > This is the direction I prefer. String encoding and decoding seems to > be a complex enough problem that it should be expressed separately > from the typed array spec itself. Very cool. Where do I provide feedback to this? Here? / Jonas
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 3:58 PM, Tab Atkins Jr. wrote: > On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking wrote: >> Hi All, >> >> Something that has come up a couple of times with content authors >> lately has been the desire to convert an ArrayBuffer (or part thereof) >> into a decoded string. Similarly being able to encode a string into an >> ArrayBuffer (or part thereof). >> >> Something as simple as >> >> DOMString decode(ArrayBufferView source, DOMString encoding); >> ArrayBufferView encode(DOMString source, DOMString encoding, >> [optional] ArrayBufferView destination); >> >> would go a very long way. The question is where to stick these >> functions. Internationalization doesn't have a obvious object we can >> hang functions off of (unlike, for example crypto), and the above >> names are much too generic to turn into global functions. >> >> Ideas/opinions/bikesheds? > > Python3 just defines str.encode and bytes.decode. Can we not do this > with String.encode and ArrayBuffer.decode? Unfortunately I suspect getting anything added on the String object will take a few years given that it's too late to get into ES6 (and in any case I suspect adding ArrayBuffer dependencies to ES6 would be controversial). / Jonas
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
Joshua Bell has been working on a string encoding and decoding API that supports the needed encodings, and which is separable from the core typed array API: http://wiki.whatwg.org/wiki/StringEncoding This is the direction I prefer. String encoding and decoding seems to be a complex enough problem that it should be expressed separately from the typed array spec itself. -Ken On Tue, Mar 13, 2012 at 5:59 PM, Ian Hickson wrote: > On Tue, 13 Mar 2012, Jonas Sicking wrote: >> >> Something that has come up a couple of times with content authors >> lately has been the desire to convert an ArrayBuffer (or part thereof) >> into a decoded string. Similarly being able to encode a string into an >> ArrayBuffer (or part thereof). >> >> Something as simple as >> >> DOMString decode(ArrayBufferView source, DOMString encoding); >> ArrayBufferView encode(DOMString source, DOMString encoding, >> [optional] ArrayBufferView destination); >> >> would go a very long way. The question is where to stick these >> functions. Internationalization doesn't have a obvious object we can >> hang functions off of (unlike, for example crypto), and the above >> names are much too generic to turn into global functions. > > Shouldn't this just be another ArrayBufferView type with special > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a > getString()/setString() method pair on DataView? > > Incidentally I _strongly_ suggest we only support UTF-8 here. > > -- > Ian Hickson U+1047E )\._.,--,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, 13 Mar 2012, Jonas Sicking wrote: > > Something that has come up a couple of times with content authors > lately has been the desire to convert an ArrayBuffer (or part thereof) > into a decoded string. Similarly being able to encode a string into an > ArrayBuffer (or part thereof). > > Something as simple as > > DOMString decode(ArrayBufferView source, DOMString encoding); > ArrayBufferView encode(DOMString source, DOMString encoding, > [optional] ArrayBufferView destination); > > would go a very long way. The question is where to stick these > functions. Internationalization doesn't have a obvious object we can > hang functions off of (unlike, for example crypto), and the above > names are much too generic to turn into global functions. Shouldn't this just be another ArrayBufferView type with special semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a getString()/setString() method pair on DataView? Incidentally I _strongly_ suggest we only support UTF-8 here. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] API for encoding/decoding ArrayBuffers into text
On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking wrote: > Hi All, > > Something that has come up a couple of times with content authors > lately has been the desire to convert an ArrayBuffer (or part thereof) > into a decoded string. Similarly being able to encode a string into an > ArrayBuffer (or part thereof). > > Something as simple as > > DOMString decode(ArrayBufferView source, DOMString encoding); > ArrayBufferView encode(DOMString source, DOMString encoding, > [optional] ArrayBufferView destination); > > would go a very long way. The question is where to stick these > functions. Internationalization doesn't have a obvious object we can > hang functions off of (unlike, for example crypto), and the above > names are much too generic to turn into global functions. > > Ideas/opinions/bikesheds? Python3 just defines str.encode and bytes.decode. Can we not do this with String.encode and ArrayBuffer.decode? ~TJ
[whatwg] API for encoding/decoding ArrayBuffers into text
Hi All, Something that has come up a couple of times with content authors lately has been the desire to convert an ArrayBuffer (or part thereof) into a decoded string. Similarly being able to encode a string into an ArrayBuffer (or part thereof). Something as simple as DOMString decode(ArrayBufferView source, DOMString encoding); ArrayBufferView encode(DOMString source, DOMString encoding, [optional] ArrayBufferView destination); would go a very long way. The question is where to stick these functions. Internationalization doesn't have a obvious object we can hang functions off of (unlike, for example crypto), and the above names are much too generic to turn into global functions. Ideas/opinions/bikesheds? / Jonas
Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."
2012/3/13 Scott González : > It's my understanding that authors should only apply ARIA via script. No. Where do you understand that from? > The > redundancy cases seem to be the most reasonable use cases I've heard of for > wanting ARIA in the initial markup, but even that seems wrong. What happens > when you have type=range and role=slider, the UA doesn't understand the new > types, and the script either never loads or has an error? The AT will pick > up the role, but none of the functionality will be there. I don't see how > that's better than not having the role applied. First, there are ARIA annotations that do not depend on JS but do overlap with native semantics, e.g.: Second, plenty of authors do produce HTML that do not work without JS. This isn't good practice, but there's little reason to discourage use of ARIA markup _especially_ in those cases, where we do not discourage other JS-dependent initial markup like (say): (without an associated form) or etc. -- Benjamin Hawkes-Lewis
Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."
On Tue, 13 Mar 2012 15:57:57 -, Hugh Guiney wrote: The validator is probably just not up to date. Note that that in this case the validator is probably right. If it's just presentational, why are you using ? It doesn't seem presentational to me. I think you are incorrectly using role=presentational here. I am using it because VoiceOver does not understand /document outlines yet, and so announces two headings when there should only be one. It is not ideal markup; I'm merely trying to provide a better experience for AT users until new elements and parsing models are understood. Dusting off proposal, which doesn't have that problem: http://www.w3.org/html/wg/wiki/ChangeProposals/hSub -- regards, Kornel Lesiński
Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."
On Mon, Mar 12, 2012 at 8:52 PM, Ian Hickson wrote: > The validator is probably just not up to date. > > Note that that in this case the validator is probably right. If it's just > presentational, why are you using ? It doesn't seem presentational to > me. I think you are incorrectly using role=presentational here. I am using it because VoiceOver does not understand /document outlines yet, and so announces two headings when there should only be one. It is not ideal markup; I'm merely trying to provide a better experience for AT users until new elements and parsing models are understood.
Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."
It's my understanding that authors should only apply ARIA via script. The redundancy cases seem to be the most reasonable use cases I've heard of for wanting ARIA in the initial markup, but even that seems wrong. What happens when you have type=range and role=slider, the UA doesn't understand the new types, and the script either never loads or has an error? The AT will pick up the role, but none of the functionality will be there. I don't see how that's better than not having the role applied. On Tue, Mar 13, 2012 at 3:11 AM, Charles Pritchard wrote: > On 3/12/12 11:42 PM, Simon Pieters wrote: > >> On Tue, 13 Mar 2012 02:16:29 +0100, Charles Pritchard >> wrote: >> >> Warnings are generally not useful. Either something is fine and we should support it, or it's wrong and we should alert the author. I think "must" is very much the appropriate requirement level here. >>> >>> From the implementation-side, the spec is wrong, it ranks native HTML >>> semantics above ARIA DOM semantics. >>> >> >> You're confusing author conformance requirements with UA conformance >> requirements. >> > > The section did confuse me. It lays out some author requirements then goes > into what looks like appropriate UA mapping. > I don't see this working well for ARIA conformance testing, but I do like > the mapping. > > This document tries to set strict requirements on authors for ARIA usage > which doesn't exist in practice. > It's intended to help, but I don't think it's needed; I believe it adds > confusion. > > The Restrictions seem fine for telling vendors that they ought to be > making their ARIA maps from native HTML to ARIA a certain way. > But, as you said, I'm getting confused reading this doc. > > > > As a "best practices" note, it seems overly optimistic. There are >>> situations with AT navigation where role conflicts do occur and/or >>> redundancy in tagging is helpful. >>> >> >> Do you have concrete examples? >> > > Concrete? No. I don't have an active JAWS/NVDA/WindowEyes + HTML4 project > in front of me. > If I did, I'm sure I'd have some concrete examples of how ARIA and HTML4 > work together with roles. > > Some wild guesses: > Treating a link as a button or a button as a link. > @disabled and aria-disabled may be used via reference with aria-controls. > type="range" and role=slider for redundancy. > various styling tricks with css selectors. > > Steve Faulkner posted that sometimes explicit ARIA roles signal to ATs to > look for more ARIA attributes. > > I've used role and/or redundant ARIA within the scripting environment to > minimize calls in applications checking for roles. Redundancy doesn't harm > anything, I actively promote it, as it does help, sometimes. Conflicts can > be a bad thing, they can lead to non-nonsensical or non-interactive > reporting by ATs. I realize that, but I'd err on the side of allowing > authors to make those decisions. They can use various tools that spit out > warnings. > > Ian has stated that warnings aren't very useful, he's looking for error or > bust. That's confusing when it comes to ARIA testing, as it's more about > the pragmatic effects of applying semantics and using a variety of ATs to > test them. > > > I don't believe it is appropriate for HTML to place restrictions on ARIA >>> DOM. It's does not reflect implementations. >>> >> >> It does not affect implementations at all. >> > > Then I'm less concerned. My understanding was that this part of the > specification is intended to affect implementations such that an authors > use of @role in a tag would be overridden by the browser if that tag is on > the conflict list. > > > The HTML spec should only specify what the default mappings are for HTML >>> elements to ARIA. >>> Authors may be advised to test AT software with their product. >>> >>> This statement is more in line with practice: "Authors must test >>> accessibility tree as part of development and usage of ARIA semantics.". >>> >> >> That's not machine checkable so less likely to have an effect at all. >> > So the "authors must" is for conformance tools? Again, it seems to be > adding confusion. I'm not the only one. > > It looks like a good section explaining mapping to implementers that has > been turned into a wiffle bat for bopping weary authors on the head. > > ARIA is a tool for supporting secondary UAs, not an extension to HTML > Forms and groups. An aria role does absolutely nothing to alter the > behavior of the primary UA. > > -Charles >
Re: [whatwg] [media] startOffsetTime, also add startTime?
On Fri, 09 Mar 2012 15:40:26 +0100, Philip Jägenstedt wrote: let me first try to summarize what I think the spec says: * currentTime need not start at 0, for streams it will typically represent for how long the server has been serving a stream. * duration is not the duration, it is the last timestamp of a resource. * startOffsetTime is the date at time 0, it's not an offset. It has nothing to do with syncing live streams. * initialTime is the first timestamp of the stream or the start time of a media fragment URL, if one is used. * For chained streams, the 2nd and subsequent clips have their timelines normalized and appended to the first clips timeline. I think this is mostly correct, but Odin pointed out to me this section of the spec: "In the absence of an explicit timeline, the zero time on the media timeline should correspond to the first frame of the media resource. For static audio and video files this is generally trivial. For streaming resources, if the user agent will be able to seek to an earlier point than the first frame originally provided by the server, then the zero time should correspond to the earliest seekable time of the media resource; otherwise, it should correspond to the first frame received from the server (the point in the media resource at which the user agent began receiving the stream)." There are multiple problems here, and I think it's responsible for some of the confusion. * What is an "explicit timeline"? For example, does an Ogg stream that starts with a non-zero timestamp have an explicit timeline? * Does "For streaming resources ..." apply only in the absence of an explicit timeline, or in general? In other words, what's the scope of "In the absence of an explicit timeline"? * Why does the spec differentiate between static and streaming resources at all? This is not a distinction Opera makes internally, the only "mode switch" we have depends on whether or not a resource is seekable, which for HTTP means support for byte-range requests. A static resource can be served by a server without support for byte-range requests such that the size and duration are known up front, and I certainly wouldn't call that streaming. These definitions can be tweaked/clarified in one of two ways: 1. currentTime always reflects the underlying timestamps, such that a resource can start playing at a non-zero offset and seekable.start(0) could be non-zero even for a fully seekable resource. This is what the spec already says, modulo the "streaming resources" weirdness. 2. Always normalize the timeline to start at 0 and end at duration. I think that the BBC blog post is favoring option 2, and while that's closest to our implementation I don't feel strongly about it. A benefit of option 1 is that currentTime=300 represents the same thing on all clients, which should solve the syncing problem without involving any kinds of dates. To sum up, here's the spec changes I still think should be made: * Make it pedantically clear which of the above two options is correct, preferably with a pretty figure of a timeline with all the values clearly marked out. * Rename startOffsetTime to make it clear that it represents the date at currentTime=0 and document that it's intended primarily for display. I wouldn't object to just dropping it until we expose other kinds of metadata like producer/location, but don't care deeply. * Drop initialTime. -- Philip Jägenstedt Core Developer Opera Software
Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."
On 3/12/12 11:42 PM, Simon Pieters wrote: On Tue, 13 Mar 2012 02:16:29 +0100, Charles Pritchard wrote: Warnings are generally not useful. Either something is fine and we should support it, or it's wrong and we should alert the author. I think "must" is very much the appropriate requirement level here. From the implementation-side, the spec is wrong, it ranks native HTML semantics above ARIA DOM semantics. You're confusing author conformance requirements with UA conformance requirements. The section did confuse me. It lays out some author requirements then goes into what looks like appropriate UA mapping. I don't see this working well for ARIA conformance testing, but I do like the mapping. This document tries to set strict requirements on authors for ARIA usage which doesn't exist in practice. It's intended to help, but I don't think it's needed; I believe it adds confusion. The Restrictions seem fine for telling vendors that they ought to be making their ARIA maps from native HTML to ARIA a certain way. But, as you said, I'm getting confused reading this doc. As a "best practices" note, it seems overly optimistic. There are situations with AT navigation where role conflicts do occur and/or redundancy in tagging is helpful. Do you have concrete examples? Concrete? No. I don't have an active JAWS/NVDA/WindowEyes + HTML4 project in front of me. If I did, I'm sure I'd have some concrete examples of how ARIA and HTML4 work together with roles. Some wild guesses: Treating a link as a button or a button as a link. @disabled and aria-disabled may be used via reference with aria-controls. type="range" and role=slider for redundancy. various styling tricks with css selectors. Steve Faulkner posted that sometimes explicit ARIA roles signal to ATs to look for more ARIA attributes. I've used role and/or redundant ARIA within the scripting environment to minimize calls in applications checking for roles. Redundancy doesn't harm anything, I actively promote it, as it does help, sometimes. Conflicts can be a bad thing, they can lead to non-nonsensical or non-interactive reporting by ATs. I realize that, but I'd err on the side of allowing authors to make those decisions. They can use various tools that spit out warnings. Ian has stated that warnings aren't very useful, he's looking for error or bust. That's confusing when it comes to ARIA testing, as it's more about the pragmatic effects of applying semantics and using a variety of ATs to test them. I don't believe it is appropriate for HTML to place restrictions on ARIA DOM. It's does not reflect implementations. It does not affect implementations at all. Then I'm less concerned. My understanding was that this part of the specification is intended to affect implementations such that an authors use of @role in a tag would be overridden by the browser if that tag is on the conflict list. The HTML spec should only specify what the default mappings are for HTML elements to ARIA. Authors may be advised to test AT software with their product. This statement is more in line with practice: "Authors must test accessibility tree as part of development and usage of ARIA semantics.". That's not machine checkable so less likely to have an effect at all. So the "authors must" is for conformance tools? Again, it seems to be adding confusion. I'm not the only one. It looks like a good section explaining mapping to implementers that has been turned into a wiffle bat for bopping weary authors on the head. ARIA is a tool for supporting secondary UAs, not an extension to HTML Forms and groups. An aria role does absolutely nothing to alter the behavior of the primary UA. -Charles