Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread John Tamplin
On Tue, Mar 13, 2012 at 8:19 PM, Glenn Maynard  wrote:

> Using Views instead of specifying the offset and length sounds good.
>
> On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson  wrote:
>
> >  - What's the use case for supporting anything but UTF-8?
> >
>
> Other Unicode encodings may be useful, to decode existing file formats
> containing (most likely at a minimum) UTF-16.  I don't feel strongly about
> that, though; we're stuck with UTF-16 as an internal representation in the
> platform, but that doesn't necessarily mean we need to support it as a
> transfer encoding.
>
> For non-Unicode legacy encodings, I think that even if use cases exist,
> they should be given more than the usual amount of scrutiny before being
> supported.
>

The whole idea is to be able to extract textual data out of some packed
binary format.  If you don't support the character sets people want to use,
they will simply do like they have to do now and hand-code the character
set conversion, where it will slow and inaccurate.

In particular, I think you have to include various ISO-8859-* character
sets (especially Latin1) and the non-Unicode character sets still
frequently used by Japanese and Chinese users.

I am fine with strongly suggesting that only UTF8 be used for new things,
but leaving out legacy support will severely limit the utility of this
library.

-- 
John A. Tamplin
Software Engineer (GWT), Google


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Glenn Maynard
Using Views instead of specifying the offset and length sounds good.

On Tue, Mar 13, 2012 at 6:28 PM, Ian Hickson  wrote:

>  - What's the use case for supporting anything but UTF-8?
>

Other Unicode encodings may be useful, to decode existing file formats
containing (most likely at a minimum) UTF-16.  I don't feel strongly about
that, though; we're stuck with UTF-16 as an internal representation in the
platform, but that doesn't necessarily mean we need to support it as a
transfer encoding.

For non-Unicode legacy encodings, I think that even if use cases exist,
they should be given more than the usual amount of scrutiny before being
supported.



On Tue, Mar 13, 2012 at 6:38 PM, Tab Atkins Jr. wrote:

> Python throws errors by default, but both functions have an additional
> argument specifying an alternate strategy.  In particular,
> bytes.decode can either drop the invalid bytes, replace them with a
> replacement char (which I agree should be U+FFFD), or replace them
> with XML entities; str.encode can choose to drop characters the
> encoding doesn't support.
>

Supporting throwing is okay if it's really wanted, but the default should
be replacement.  It reduces fatal errors to (usually) non-fatal
replacement, for obscure cases that people generally don't test.  It's a
much more sane default failure mode.

As another option, never throw, but allow returning the number of
conversion errors:

results = encode("abc\uD800def", outputView, "UTF-8");

where results.inputConsumed is the number of words consumed in myString,
results.outputWritten is the number of UTF-8 bytes written, and
results.errors is 1.

That also allows block-by-block conversion; for example, to convert as many
complete characters as possible into a fixed-size buffer for transmission,
then starting again at the next unencoded character.

One more idea, while I'm brainstorming: if outputView is null, allocate an
ArrayBuffer of the necessary size, storing it in results.output.  That
eliminates the need for a separate length pass, without bloating the API
with another overload.


On Tue, Mar 13, 2012 at 6:50 PM, Joshua Bell  wrote:

> (Cue a strong "nooo!" from Anne.)
>

(Count me in on that, too.  Heuristics bad.)

 Ignoring the issue of invalid code points, the length calculations for
> non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not
> be sanitized, that case is trivially 2x the JS string length.)
>

UTF-16 "sanitization" (replacing mismatched surrogates with U+FFFD) doesn't
change the size of the output, actually.

-- 
Glenn Maynard


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Ian Hickson
On Tue, 13 Mar 2012, Joshua Bell wrote:
> 
> For both of the above: initially suggested use cases included parsing 
> data as esoteric as ID3 tags in MP3 files, where encoding unspecified 
> and is guessed at by decoders, and includes non-Unicode encodings. It 
> was suggested that the encoding sniffing capabilities of browsers be 
> leveraged. [...]
> 
> Whether we should restrict it as far as UTF-8 depends on whether we 
> envision this API only used for parsing/serializing newly defined data 
> formats, or whether there is consideration for interop with previously 
> existing formats data formats and code.

Seems reasonable. If we have specific use cases for non-UTF-8 encodings, I 
agree we should support them; if that's the case, we should survey those 
use cases to work out what the set of encodings we need is, and add just 
those.


> >  - Having a mechanism that lets you encode the string and get a length
> >   separate from the mechanism that lets you encode the string and get the
> >   encoded string seems like it would encourage very inefficient code. Can
> >   we instead have a mechanism that returns both at once? Or is the idea
> >   that for some encodings getting the encoded length is much quicker than
> >   getting the actual string?
> >
> 
> The use case was to compute the size necessary to allocate a single buffer
> into which may be encoded multiple strings and other data, rather than
> allocating multiple small buffers and then copying strings into a larger
> buffer.
> 
> Ignoring the issue of invalid code points, the length calculations for
> non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not
> be sanitized, that case is trivially 2x the JS string length.)

Yeah, but surely we'll mainly be doing stuff with UTF-8...

One option is to return an opaque object of the form:

   interface EncodedString {
 readonly attributes unsigned long length;
 // internally has a copy of the encoded string
   }

...and then have view.setString take this EncodedString object. At least 
then you get it down to an extraneous copy, rather than an extraneous 
encode. Still not ideal though.


> >  - Seems weird that integers and strings would have such different APIs
> >   for doing the same thing. Why can't we handle them equivalently? As in:
> >
> > len = view.setString(strings[i],
> >  offset + Uint32Array.BYTES_PER_ELEMENT,
> >  "UTF-8");
> > view.setUint32(offset, len);
> > offset += Uint32Array.BYTES_PER_ELEMENT + len;
> 
> Heh, that's where the discussion started, actually. We wanted to keep 
> the DataView interface simple, and potentially support encoding into 
> plain JS arrays and/or non-TypedArray support that appeared to be on the 
> horizon for JS.

I see where you're coming from, but I think we should look at the platform 
as a whole, not just one API. It doesn't help the platform as a whole if 
we just have the same features split across two interfaces, the complexity 
is even slightly higher than just having one consistent API that does ints 
and strings equivalently.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Joshua Bell
On Tue, Mar 13, 2012 at 4:28 PM, Ian Hickson  wrote:

> On Tue, 13 Mar 2012, Joshua Bell wrote:
> > On Tue, Mar 13, 2012 at 4:10 PM, Jonas Sicking  wrote:
> > > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell 
> > > wrote:
> > > > Joshua Bell has been working on a string encoding and decoding API
> > > > that supports the needed encodings, and which is separable from the
> > > > core typed array API:
> > > >
> > > > http://wiki.whatwg.org/wiki/StringEncoding
> > > >
> > > > This is the direction I prefer. String encoding and decoding seems
> > > > to be a complex enough problem that it should be expressed
> > > > separately from the typed array spec itself.
>
> Some quick feedback:
>
>  - [OmitConstructor] doesn't seem to be WebIDL
>

Historically, the spec started off as an addition to the Typed Array spec
that splintered off; cleanup is definitely needed, thanks.


>  - please don't allow UAs to implement other encodings. You should list
>   the exact set of supported encodings and the exact labels that should
>   be recognised as meaning those encodings, and disallow all others.
>   Otherwise, we'll be in a never-ending game of reverse-engineering each
>   others' lists of supported encodings and it'll keep growing.
>
>  - What's the use case for supporting anything but UTF-8?
>

For both of the above: initially suggested use cases included parsing data
as esoteric as ID3 tags in MP3 files, where encoding unspecified and is
guessed at by decoders, and includes non-Unicode encodings. It was
suggested that the encoding sniffing capabilities of browsers be leveraged.
(Cue a strong "nooo!" from Anne.)

I completely agree that we should explicitly list the set of encoding
supported and should remove the "other encodings" allowance.

Whether we should restrict it as far as UTF-8 depends on whether we
envision this API only used for parsing/serializing newly defined data
formats, or whether there is consideration for interop with previously
existing formats data formats and code. For example, "BINARY" would be used
to bridge the existing atob()/btoa() methods with Typed Arrays (although
base64 directly in/out of Typed Arrays would be preferable).

Jonas, since you started this thread - did your content authors mention
encodings?


>  - Having a mechanism that lets you encode the string and get a length
>   separate from the mechanism that lets you encode the string and get the
>   encoded string seems like it would encourage very inefficient code. Can
>   we instead have a mechanism that returns both at once? Or is the idea
>   that for some encodings getting the encoded length is much quicker than
>   getting the actual string?
>

The use case was to compute the size necessary to allocate a single buffer
into which may be encoded multiple strings and other data, rather than
allocating multiple small buffers and then copying strings into a larger
buffer.

Ignoring the issue of invalid code points, the length calculations for
non-UTF-8 encodings are trivial. (And with the suggestion that UTF-16 not
be sanitized, that case is trivially 2x the JS string length.)


>  - Seems weird that integers and strings would have such different APIs
>   for doing the same thing. Why can't we handle them equivalently? As in:
>
> len = view.setString(strings[i],
>  offset + Uint32Array.BYTES_PER_ELEMENT,
>  "UTF-8");
> view.setUint32(offset, len);
> offset += Uint32Array.BYTES_PER_ELEMENT + len;
>

Heh, that's where the discussion started, actually. We wanted to keep the
DataView interface simple, and potentially support encoding into plain JS
arrays and/or non-TypedArray support that appeared to be on the horizon for
JS.



> HTH,
> --
> Ian Hickson   U+1047E)\._.,--,'``.fL
> http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Tab Atkins Jr.
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard  wrote:
> The API on that wiki page is a reasonable start.  For the same reasons that
> we discussed in a recent thread (
> http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html),
> conversion errors should use replacement (eg. U+FFFD), not throw
> exceptions.

Python throws errors by default, but both functions have an additional
argument specifying an alternate strategy.  In particular,
bytes.decode can either drop the invalid bytes, replace them with a
replacement char (which I agree should be U+FFFD), or replace them
with XML entities; str.encode can choose to drop characters the
encoding doesn't support.

~TJ


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Tab Atkins Jr.
On Tue, Mar 13, 2012 at 4:08 PM, Jonas Sicking  wrote:
> On Tue, Mar 13, 2012 at 3:58 PM, Tab Atkins Jr.  wrote:
>> On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking  wrote:
>>> Hi All,
>>>
>>> Something that has come up a couple of times with content authors
>>> lately has been the desire to convert an ArrayBuffer (or part thereof)
>>> into a decoded string. Similarly being able to encode a string into an
>>> ArrayBuffer (or part thereof).
>>>
>>> Something as simple as
>>>
>>> DOMString decode(ArrayBufferView source, DOMString encoding);
>>> ArrayBufferView encode(DOMString source, DOMString encoding,
>>> [optional] ArrayBufferView destination);
>>>
>>> would go a very long way. The question is where to stick these
>>> functions. Internationalization doesn't have a obvious object we can
>>> hang functions off of (unlike, for example crypto), and the above
>>> names are much too generic to turn into global functions.
>>>
>>> Ideas/opinions/bikesheds?
>>
>> Python3 just defines str.encode and bytes.decode.  Can we not do this
>> with String.encode and ArrayBuffer.decode?
>
> Unfortunately I suspect getting anything added on the String object
> will take a few years given that it's too late to get into ES6 (and in
> any case I suspect adding ArrayBuffer dependencies to ES6 would be
> controversial).

Like Ian said, I don't see anything particularly bad about the spec
defining ArrayBuffers to define an ArrayBuffer-related method on
String.  There's no reason it has to be in the ES spec.

~TJ


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Ian Hickson
On Tue, 13 Mar 2012, Joshua Bell wrote:
> 
> WHATWG makes sense, I just hadn't gotten around to shopping for a home. 
> (Administrivia: Is there need to propose a charter addition?)

You're welcome to use the WHATWG list for this. Charters are pointless and 
there's no need to worry about them here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Ian Hickson
On Tue, 13 Mar 2012, Joshua Bell wrote:
> On Tue, Mar 13, 2012 at 4:10 PM, Jonas Sicking  wrote:
> > On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell  
> > wrote:
> > > Joshua Bell has been working on a string encoding and decoding API 
> > > that supports the needed encodings, and which is separable from the 
> > > core typed array API:
> > >
> > > http://wiki.whatwg.org/wiki/StringEncoding
> > >
> > > This is the direction I prefer. String encoding and decoding seems 
> > > to be a complex enough problem that it should be expressed 
> > > separately from the typed array spec itself.

Some quick feedback:

 - [OmitConstructor] doesn't seem to be WebIDL

 - please don't allow UAs to implement other encodings. You should list 
   the exact set of supported encodings and the exact labels that should 
   be recognised as meaning those encodings, and disallow all others. 
   Otherwise, we'll be in a never-ending game of reverse-engineering each 
   others' lists of supported encodings and it'll keep growing.

 - What's the use case for supporting anything but UTF-8?

 - Having a mechanism that lets you encode the string and get a length 
   separate from the mechanism that lets you encode the string and get the 
   encoded string seems like it would encourage very inefficient code. Can 
   we instead have a mechanism that returns both at once? Or is the idea 
   that for some encodings getting the encoded length is much quicker than 
   getting the actual string?

 - Seems weird that integers and strings would have such different APIs 
   for doing the same thing. Why can't we handle them equivalently? As in:

 len = view.setString(strings[i],
  offset + Uint32Array.BYTES_PER_ELEMENT,
  "UTF-8");
 view.setUint32(offset, len);
 offset += Uint32Array.BYTES_PER_ELEMENT + len;

HTH,
-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Joshua Bell
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard  wrote:

> On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking  wrote:
>
> > Something that has come up a couple of times with content authors
> > lately has been the desire to convert an ArrayBuffer (or part thereof)
> > into a decoded string. Similarly being able to encode a string into an
> > ArrayBuffer (or part thereof).
> >
>
> There was discussion about this before:
>
>
> https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html
> http://wiki.whatwg.org/wiki/StringEncoding
>
> (I don't know why it was on the WebGL list; typed arrays are becoming
> infrastructural and this doesn't seem like it belongs there, even though
> ArrayBuffer was started there.)
>

Purely historical; early adopters of Typed Arrays were folks prototyping
with WebGL who wanted to parse data files containing strings.

WHATWG makes sense, I just hadn't gotten around to shopping for a home.
(Administrivia: Is there need to propose a charter addition?)


> The API on that wiki page is a reasonable start.  For the same reasons that
> we discussed in a recent thread (
> http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html),
> conversion errors should use replacement (eg. U+FFFD), not throw
> exceptions.  The "any" arguments should be fixed.  Encoding to UTF-16
> should definitely not prefix a BOM, and UTF-16 having unspecified
> endianness is obviously bad.
>
> I'd also suggest that, unless there's serious, substantiated demand for
> it--which I doubt--only major Unicode encodings be supported.  Don't make
> it easier for people to keep using legacy encodings.
>
>
Two other pieces of feedback I received from Adam Barth off list:

* take ArrayBufferView as input which both fixes "any" and simplifies the
API to eliminate byteOffset and byteLength
* support two versions of encode, one which takes a target ArrayBufferView,
and one which allocates/returns a new Uint8Array of the appropriate length.



> > Shouldn't this just be another ArrayBufferView type with special
> > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a
> > getString()/setString() method pair on DataView?
>
> I don't think so, because retrieving the N'th decoded/reencoded character
> isn't a constant-time operation.
>
> --
> Glenn Maynard
>


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Joshua Bell
On Tue, Mar 13, 2012 at 4:11 PM, Glenn Maynard  wrote:

> On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking  wrote:
>
> > Something that has come up a couple of times with content authors
> > lately has been the desire to convert an ArrayBuffer (or part thereof)
> > into a decoded string. Similarly being able to encode a string into an
> > ArrayBuffer (or part thereof).
> >
>
> There was discussion about this before:
>
>
> https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html
> http://wiki.whatwg.org/wiki/StringEncoding
>
> (I don't know why it was on the WebGL list; typed arrays are becoming
> infrastructural and this doesn't seem like it belongs there, even though
> ArrayBuffer was started there.)
>
> The API on that wiki page is a reasonable start.  For the same reasons that
> we discussed in a recent thread (
> http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html),
> conversion errors should use replacement (eg. U+FFFD), not throw
> exceptions.  The "any" arguments should be fixed.  Encoding to UTF-16
> should definitely not prefix a BOM, and UTF-16 having unspecified
> endianness is obviously bad.
>
> I'd also suggest that, unless there's serious, substantiated demand for
> it--which I doubt--only major Unicode encodings be supported.  Don't make
> it easier for people to keep using legacy encodings.
>
>
Two other pieces of feedback I received from Adam Barth off list:

* take ArrayBufferView as input which both fixes "any" and simplifies the
API to eliminate byteOffset and byteLength
* support two versions of encode, one which takes a target ArrayBufferView,
and one which allocates/returns a new Uint8Array of the appropriate length.



> > Shouldn't this just be another ArrayBufferView type with special
> > semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a
> > getString()/setString() method pair on DataView?
>
> I don't think so, because retrieving the N'th decoded/reencoded character
> isn't a constant-time operation.
>
> --
> Glenn Maynard
>


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Ian Hickson
On Tue, 13 Mar 2012, Jonas Sicking wrote:
>
> Unfortunately I suspect getting anything added on the String object will 
> take a few years given that it's too late to get into ES6 (and in any 
> case I suspect adding ArrayBuffer dependencies to ES6 would be 
> controversial).

We can just define it outside the ES spec.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Kenneth Russell
On Tue, Mar 13, 2012 at 6:10 PM, Jonas Sicking  wrote:
> On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell  wrote:
>> Joshua Bell has been working on a string encoding and decoding API
>> that supports the needed encodings, and which is separable from the
>> core typed array API:
>>
>> http://wiki.whatwg.org/wiki/StringEncoding
>>
>> This is the direction I prefer. String encoding and decoding seems to
>> be a complex enough problem that it should be expressed separately
>> from the typed array spec itself.
>
> Very cool. Where do I provide feedback to this? Here?

This list seems like a good place to discuss it.

-Ken


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Glenn Maynard
On Tue, Mar 13, 2012 at 5:49 PM, Jonas Sicking  wrote:

> Something that has come up a couple of times with content authors
> lately has been the desire to convert an ArrayBuffer (or part thereof)
> into a decoded string. Similarly being able to encode a string into an
> ArrayBuffer (or part thereof).
>

There was discussion about this before:

https://www.khronos.org/webgl/public-mailing-list/archives//msg00017.html
http://wiki.whatwg.org/wiki/StringEncoding

(I don't know why it was on the WebGL list; typed arrays are becoming
infrastructural and this doesn't seem like it belongs there, even though
ArrayBuffer was started there.)

The API on that wiki page is a reasonable start.  For the same reasons that
we discussed in a recent thread (
http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html),
conversion errors should use replacement (eg. U+FFFD), not throw
exceptions.  The "any" arguments should be fixed.  Encoding to UTF-16
should definitely not prefix a BOM, and UTF-16 having unspecified
endianness is obviously bad.

I'd also suggest that, unless there's serious, substantiated demand for
it--which I doubt--only major Unicode encodings be supported.  Don't make
it easier for people to keep using legacy encodings.

> Shouldn't this just be another ArrayBufferView type with special
> semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a
> getString()/setString() method pair on DataView?

I don't think so, because retrieving the N'th decoded/reencoded character
isn't a constant-time operation.

-- 
Glenn Maynard


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Jonas Sicking
On Tue, Mar 13, 2012 at 4:08 PM, Kenneth Russell  wrote:
> Joshua Bell has been working on a string encoding and decoding API
> that supports the needed encodings, and which is separable from the
> core typed array API:
>
> http://wiki.whatwg.org/wiki/StringEncoding
>
> This is the direction I prefer. String encoding and decoding seems to
> be a complex enough problem that it should be expressed separately
> from the typed array spec itself.

Very cool. Where do I provide feedback to this? Here?

/ Jonas


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Jonas Sicking
On Tue, Mar 13, 2012 at 3:58 PM, Tab Atkins Jr.  wrote:
> On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking  wrote:
>> Hi All,
>>
>> Something that has come up a couple of times with content authors
>> lately has been the desire to convert an ArrayBuffer (or part thereof)
>> into a decoded string. Similarly being able to encode a string into an
>> ArrayBuffer (or part thereof).
>>
>> Something as simple as
>>
>> DOMString decode(ArrayBufferView source, DOMString encoding);
>> ArrayBufferView encode(DOMString source, DOMString encoding,
>> [optional] ArrayBufferView destination);
>>
>> would go a very long way. The question is where to stick these
>> functions. Internationalization doesn't have a obvious object we can
>> hang functions off of (unlike, for example crypto), and the above
>> names are much too generic to turn into global functions.
>>
>> Ideas/opinions/bikesheds?
>
> Python3 just defines str.encode and bytes.decode.  Can we not do this
> with String.encode and ArrayBuffer.decode?

Unfortunately I suspect getting anything added on the String object
will take a few years given that it's too late to get into ES6 (and in
any case I suspect adding ArrayBuffer dependencies to ES6 would be
controversial).

/ Jonas


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Kenneth Russell
Joshua Bell has been working on a string encoding and decoding API
that supports the needed encodings, and which is separable from the
core typed array API:

http://wiki.whatwg.org/wiki/StringEncoding

This is the direction I prefer. String encoding and decoding seems to
be a complex enough problem that it should be expressed separately
from the typed array spec itself.

-Ken


On Tue, Mar 13, 2012 at 5:59 PM, Ian Hickson  wrote:
> On Tue, 13 Mar 2012, Jonas Sicking wrote:
>>
>> Something that has come up a couple of times with content authors
>> lately has been the desire to convert an ArrayBuffer (or part thereof)
>> into a decoded string. Similarly being able to encode a string into an
>> ArrayBuffer (or part thereof).
>>
>> Something as simple as
>>
>> DOMString decode(ArrayBufferView source, DOMString encoding);
>> ArrayBufferView encode(DOMString source, DOMString encoding,
>> [optional] ArrayBufferView destination);
>>
>> would go a very long way. The question is where to stick these
>> functions. Internationalization doesn't have a obvious object we can
>> hang functions off of (unlike, for example crypto), and the above
>> names are much too generic to turn into global functions.
>
> Shouldn't this just be another ArrayBufferView type with special
> semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a
> getString()/setString() method pair on DataView?
>
> Incidentally I _strongly_ suggest we only support UTF-8 here.
>
> --
> Ian Hickson               U+1047E                )\._.,--,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Ian Hickson
On Tue, 13 Mar 2012, Jonas Sicking wrote:
> 
> Something that has come up a couple of times with content authors
> lately has been the desire to convert an ArrayBuffer (or part thereof)
> into a decoded string. Similarly being able to encode a string into an
> ArrayBuffer (or part thereof).
> 
> Something as simple as
> 
> DOMString decode(ArrayBufferView source, DOMString encoding);
> ArrayBufferView encode(DOMString source, DOMString encoding,
> [optional] ArrayBufferView destination);
> 
> would go a very long way. The question is where to stick these
> functions. Internationalization doesn't have a obvious object we can
> hang functions off of (unlike, for example crypto), and the above
> names are much too generic to turn into global functions.

Shouldn't this just be another ArrayBufferView type with special 
semantics, like Uint8ClampedArray? DOMStringArray or some such? And/or a 
getString()/setString() method pair on DataView?

Incidentally I _strongly_ suggest we only support UTF-8 here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Tab Atkins Jr.
On Tue, Mar 13, 2012 at 3:49 PM, Jonas Sicking  wrote:
> Hi All,
>
> Something that has come up a couple of times with content authors
> lately has been the desire to convert an ArrayBuffer (or part thereof)
> into a decoded string. Similarly being able to encode a string into an
> ArrayBuffer (or part thereof).
>
> Something as simple as
>
> DOMString decode(ArrayBufferView source, DOMString encoding);
> ArrayBufferView encode(DOMString source, DOMString encoding,
> [optional] ArrayBufferView destination);
>
> would go a very long way. The question is where to stick these
> functions. Internationalization doesn't have a obvious object we can
> hang functions off of (unlike, for example crypto), and the above
> names are much too generic to turn into global functions.
>
> Ideas/opinions/bikesheds?

Python3 just defines str.encode and bytes.decode.  Can we not do this
with String.encode and ArrayBuffer.decode?

~TJ


[whatwg] API for encoding/decoding ArrayBuffers into text

2012-03-13 Thread Jonas Sicking
Hi All,

Something that has come up a couple of times with content authors
lately has been the desire to convert an ArrayBuffer (or part thereof)
into a decoded string. Similarly being able to encode a string into an
ArrayBuffer (or part thereof).

Something as simple as

DOMString decode(ArrayBufferView source, DOMString encoding);
ArrayBufferView encode(DOMString source, DOMString encoding,
[optional] ArrayBufferView destination);

would go a very long way. The question is where to stick these
functions. Internationalization doesn't have a obvious object we can
hang functions off of (unlike, for example crypto), and the above
names are much too generic to turn into global functions.

Ideas/opinions/bikesheds?

/ Jonas


Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."

2012-03-13 Thread Benjamin Hawkes-Lewis
2012/3/13 Scott González :
> It's my understanding that authors should only apply ARIA via script.

No. Where do you understand that from?

> The
> redundancy cases seem to be the most reasonable use cases I've heard of for
> wanting ARIA in the initial markup, but even that seems wrong. What happens
> when you have type=range and role=slider, the UA doesn't understand the new
> types, and the script either never loads or has an error? The AT will pick
> up the role, but none of the functionality will be there. I don't see how
> that's better than not having the role applied.

First, there are ARIA annotations that do not depend on JS but do
overlap with native semantics, e.g.:







Second, plenty of authors do produce HTML that do not work without JS.
This isn't good practice, but there's little reason to discourage use
of ARIA markup _especially_ in those cases, where we do not discourage
other JS-dependent initial markup like (say):

 (without an associated form)

or



etc.

--
Benjamin Hawkes-Lewis


Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."

2012-03-13 Thread Kornel Lesiński
On Tue, 13 Mar 2012 15:57:57 -, Hugh Guiney   
wrote:



The validator is probably just not up to date.

Note that that in this case the validator is probably right. If it's  
just presentational, why are you using ? It doesn't seem  
presentational to me. I think you are incorrectly using  
role=presentational here.


I am using it because VoiceOver does not understand /document
outlines yet, and so announces two headings when there should only be
one. It is not ideal markup; I'm merely trying to provide a better
experience for AT users until new elements and parsing models are
understood.


Dusting off  proposal, which doesn't have that problem:

http://www.w3.org/html/wg/wiki/ChangeProposals/hSub

--
regards, Kornel Lesiński


Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."

2012-03-13 Thread Hugh Guiney
On Mon, Mar 12, 2012 at 8:52 PM, Ian Hickson  wrote:
> The validator is probably just not up to date.
>
> Note that that in this case the validator is probably right. If it's just
> presentational, why are you using ? It doesn't seem presentational to
> me. I think you are incorrectly using role=presentational here.

I am using it because VoiceOver does not understand /document
outlines yet, and so announces two headings when there should only be
one. It is not ideal markup; I'm merely trying to provide a better
experience for AT users until new elements and parsing models are
understood.


Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."

2012-03-13 Thread Scott González
It's my understanding that authors should only apply ARIA via script. The
redundancy cases seem to be the most reasonable use cases I've heard of for
wanting ARIA in the initial markup, but even that seems wrong. What happens
when you have type=range and role=slider, the UA doesn't understand the new
types, and the script either never loads or has an error? The AT will pick
up the role, but none of the functionality will be there. I don't see how
that's better than not having the role applied.


On Tue, Mar 13, 2012 at 3:11 AM, Charles Pritchard  wrote:

> On 3/12/12 11:42 PM, Simon Pieters wrote:
>
>> On Tue, 13 Mar 2012 02:16:29 +0100, Charles Pritchard 
>> wrote:
>>
>>  Warnings are generally not useful. Either something is fine and we should
 support it, or it's wrong and we should alert the author. I think "must"
 is very much the appropriate requirement level here.

>>>
>>>  From the implementation-side, the spec is wrong, it ranks native HTML
>>> semantics above ARIA DOM semantics.
>>>
>>
>> You're confusing author conformance requirements with UA conformance
>> requirements.
>>
>
> The section did confuse me. It lays out some author requirements then goes
> into what looks like appropriate UA mapping.
> I don't see this working well for ARIA conformance testing, but I do like
> the mapping.
>
> This document tries to set strict requirements on authors for ARIA usage
> which doesn't exist in practice.
> It's intended to help, but I don't think it's needed; I believe it adds
> confusion.
>
> The Restrictions seem fine for telling vendors that they ought to be
> making their ARIA maps from native HTML to ARIA a certain way.
> But, as you said, I'm getting confused reading this doc.
>
>
>
>  As a "best practices" note, it seems overly optimistic. There are
>>> situations with AT navigation where role conflicts do occur and/or
>>> redundancy in tagging is helpful.
>>>
>>
>> Do you have concrete examples?
>>
>
> Concrete? No. I don't have an active JAWS/NVDA/WindowEyes + HTML4 project
> in front of me.
> If I did, I'm sure I'd have some concrete examples of how ARIA and HTML4
> work together with roles.
>
> Some wild guesses:
> Treating a link as a button or a button as a link.
> @disabled and aria-disabled may be used via reference with aria-controls.
> type="range" and role=slider for redundancy.
> various styling tricks with css selectors.
>
> Steve Faulkner posted that sometimes explicit ARIA roles signal to ATs to
> look for more ARIA attributes.
>
> I've used role and/or redundant ARIA within the scripting environment to
> minimize calls in applications checking for roles. Redundancy doesn't harm
> anything, I actively promote it, as it does help, sometimes. Conflicts can
> be a bad thing, they can lead to non-nonsensical or non-interactive
> reporting by ATs. I realize that, but I'd err on the side of allowing
> authors to make those decisions. They can use various tools that spit out
> warnings.
>
> Ian has stated that warnings aren't very useful, he's looking for error or
> bust. That's confusing when it comes to ARIA testing, as it's more about
> the pragmatic effects of applying semantics and using a variety of ATs to
> test them.
>
>
>  I don't believe it is appropriate for HTML to place restrictions on ARIA
>>> DOM. It's does not reflect implementations.
>>>
>>
>> It does not affect implementations at all.
>>
>
> Then I'm less concerned. My understanding was that this part of the
> specification is intended to affect implementations such that an authors
> use of @role in a tag would be overridden by the browser if that tag is on
> the conflict list.
>
>
>  The HTML spec should only specify what the default mappings are for HTML
>>> elements to ARIA.
>>> Authors may be advised to test AT software with their product.
>>>
>>> This statement is more in line with practice: "Authors must test
>>> accessibility tree as part of development and usage of ARIA semantics.".
>>>
>>
>> That's not machine checkable so less likely to have an effect at all.
>>
> So the "authors must" is for conformance tools? Again, it seems to be
> adding confusion. I'm not the only one.
>
> It looks like a good section explaining mapping to implementers that has
> been turned into a wiffle bat for bopping weary authors on the head.
>
> ARIA is a tool for supporting secondary UAs, not an extension to HTML
> Forms and groups. An aria role does absolutely nothing to alter the
> behavior of the primary UA.
>
> -Charles
>


Re: [whatwg] [media] startOffsetTime, also add startTime?

2012-03-13 Thread Philip Jägenstedt
On Fri, 09 Mar 2012 15:40:26 +0100, Philip Jägenstedt   
wrote:



let me first try to summarize what I think the spec says:

* currentTime need not start at 0, for streams it will typically  
represent for how long the server has been serving a stream.


* duration is not the duration, it is the last timestamp of a resource.

* startOffsetTime is the date at time 0, it's not an offset. It has  
nothing to do with syncing live streams.


* initialTime is the first timestamp of the stream or the start time of  
a media fragment URL, if one is used.


* For chained streams, the 2nd and subsequent clips have their timelines  
normalized and appended to the first clips timeline.


I think this is mostly correct, but Odin pointed out to me this section of  
the spec:


"In the absence of an explicit timeline, the zero time on the media  
timeline should correspond to the first frame of the media resource. For  
static audio and video files this is generally trivial. For streaming  
resources, if the user agent will be able to seek to an earlier point than  
the first frame originally provided by the server, then the zero time  
should correspond to the earliest seekable time of the media resource;  
otherwise, it should correspond to the first frame received from the  
server (the point in the media resource at which the user agent began  
receiving the stream)."


There are multiple problems here, and I think it's responsible for some of  
the confusion.


* What is an "explicit timeline"? For example, does an Ogg stream that  
starts with a non-zero timestamp have an explicit timeline?


* Does "For streaming resources ..." apply only in the absence of an  
explicit timeline, or in general? In other words, what's the scope of "In  
the absence of an explicit timeline"?


* Why does the spec differentiate between static and streaming resources  
at all? This is not a distinction Opera makes internally, the only "mode  
switch" we have depends on whether or not a resource is seekable, which  
for HTTP means support for byte-range requests. A static resource can be  
served by a server without support for byte-range requests such that the  
size and duration are known up front, and I certainly wouldn't call that  
streaming.


These definitions can be tweaked/clarified in one of two ways:

1. currentTime always reflects the underlying timestamps, such that a  
resource can start playing at a non-zero offset and seekable.start(0)  
could be non-zero even for a fully seekable resource. This is what the  
spec already says, modulo the "streaming resources" weirdness.


2. Always normalize the timeline to start at 0 and end at duration.

I think that the BBC blog post is favoring option 2, and while that's  
closest to our implementation I don't feel strongly about it. A benefit of  
option 1 is that currentTime=300 represents the same thing on all clients,  
which should solve the syncing problem without involving any kinds of  
dates.


To sum up, here's the spec changes I still think should be made:

* Make it pedantically clear which of the above two options is correct,  
preferably with a pretty figure of a timeline with all the values clearly  
marked out.


* Rename startOffsetTime to make it clear that it represents the date at  
currentTime=0 and document that it's intended primarily for display. I  
wouldn't object to just dropping it until we expose other kinds of  
metadata like producer/location, but don't care deeply.


* Drop initialTime.

--
Philip Jägenstedt
Core Developer
Opera Software


Re: [whatwg] Validator.nu: "Attribute role not allowed on element h2 at this point."

2012-03-13 Thread Charles Pritchard

On 3/12/12 11:42 PM, Simon Pieters wrote:
On Tue, 13 Mar 2012 02:16:29 +0100, Charles Pritchard 
 wrote:


Warnings are generally not useful. Either something is fine and we 
should
support it, or it's wrong and we should alert the author. I think 
"must"

is very much the appropriate requirement level here.


 From the implementation-side, the spec is wrong, it ranks native 
HTML semantics above ARIA DOM semantics.


You're confusing author conformance requirements with UA conformance 
requirements.


The section did confuse me. It lays out some author requirements then 
goes into what looks like appropriate UA mapping.
I don't see this working well for ARIA conformance testing, but I do 
like the mapping.


This document tries to set strict requirements on authors for ARIA usage 
which doesn't exist in practice.
It's intended to help, but I don't think it's needed; I believe it adds 
confusion.


The Restrictions seem fine for telling vendors that they ought to be 
making their ARIA maps from native HTML to ARIA a certain way.

But, as you said, I'm getting confused reading this doc.


As a "best practices" note, it seems overly optimistic. There are 
situations with AT navigation where role conflicts do occur and/or 
redundancy in tagging is helpful.


Do you have concrete examples?


Concrete? No. I don't have an active JAWS/NVDA/WindowEyes + HTML4 
project in front of me.
If I did, I'm sure I'd have some concrete examples of how ARIA and HTML4 
work together with roles.


Some wild guesses:
Treating a link as a button or a button as a link.
@disabled and aria-disabled may be used via reference with aria-controls.
type="range" and role=slider for redundancy.
various styling tricks with css selectors.

Steve Faulkner posted that sometimes explicit ARIA roles signal to ATs 
to look for more ARIA attributes.


I've used role and/or redundant ARIA within the scripting environment to 
minimize calls in applications checking for roles. Redundancy doesn't 
harm anything, I actively promote it, as it does help, sometimes. 
Conflicts can be a bad thing, they can lead to non-nonsensical or 
non-interactive reporting by ATs. I realize that, but I'd err on the 
side of allowing authors to make those decisions. They can use various 
tools that spit out warnings.


Ian has stated that warnings aren't very useful, he's looking for error 
or bust. That's confusing when it comes to ARIA testing, as it's more 
about the pragmatic effects of applying semantics and using a variety of 
ATs to test them.


I don't believe it is appropriate for HTML to place restrictions on 
ARIA DOM. It's does not reflect implementations.


It does not affect implementations at all.


Then I'm less concerned. My understanding was that this part of the 
specification is intended to affect implementations such that an authors 
use of @role in a tag would be overridden by the browser if that tag is 
on the conflict list.


The HTML spec should only specify what the default mappings are for 
HTML elements to ARIA.

Authors may be advised to test AT software with their product.

This statement is more in line with practice: "Authors must test 
accessibility tree as part of development and usage of ARIA semantics.".


That's not machine checkable so less likely to have an effect at all.
So the "authors must" is for conformance tools? Again, it seems to be 
adding confusion. I'm not the only one.


It looks like a good section explaining mapping to implementers that has 
been turned into a wiffle bat for bopping weary authors on the head.


ARIA is a tool for supporting secondary UAs, not an extension to HTML 
Forms and groups. An aria role does absolutely nothing to alter the 
behavior of the primary UA.


-Charles