[whatwg] postMessage: max length / size

2009-10-23 Thread Brian Kuhn
Is there any limit to the length of message you can send with postMessage
(HTML5 Cross-document messaging)?

I didn't see anything in the spec about this.  I thought this might be one
area where implementations might end up differing.

Thanks,
Brian


Re: [whatwg] postMessage: max length / size

2009-10-23 Thread Ian Hickson
On Thu, 22 Oct 2009, Brian Kuhn wrote:

 Is there any limit to the length of message you can send with 
 postMessage (HTML5 Cross-document messaging)?
 
 I didn't see anything in the spec about this.  I thought this might be 
 one area where implementations might end up differing.

There are probably implementation-specific limits, but HTML tries to not 
say what the limits should be, since it's hard to know what they should 
be. It might vary from platform to platform and device to device, and will 
almost certainly vary over time.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread NARUSE, Yui


Ian Hickson wrote:
 Authors should not use JIS-X-0208 (JIS_C6226-1983), JIS-X-0212
 (JIS_X0212-1990), encodings based on ISO-2022, and encodings based on
 EBCDIC.
 It is not clear what this means (e.g., the character set JIS_C6226-1983 in
 any encoding, or only when encoded alone according to RFC1345 as described
 above); 
 
 This is talking about character encodings, not character sets. 
 JIS_C6226-1983 is a registered character encoding in the IANA registry.

Yes, I can understand this, but...

 On Fri, 23 Oct 2009, NARUSE, Yui wrote:
 Authors should not use JIS-X-0208 (JIS_C6226-1983), JIS-X-0212 
 (JIS_X0212-1990), encodings based on ISO-2022, and encodings based 
 on EBCDIC.
 First, JIS-X-0208 and JIS-X-0212 are not in IANA Charsets, moreover 
 those correct names as spec are JIS X 0208 and JIS X 0212.
 
 On Thu, 22 Oct 2009, �istein E. Andersen wrote:
 I am not sure what you mean; they are both listed at
 http://www.iana.org/assignments/character-sets:

 Name: JIS_C6226-1983 [RFC1345,KXS2]
 MIBenum: 63
 Source: ECMA registry
 Alias: iso-ir-87
 Alias: x0208
 Alias: JIS_X0208-1983
 Alias: csISO87JISX0208

 Name: JIS_X0212-1990 [RFC1345,KXS2]
 MIBenum: 98
 Source: ECMA registry
 Alias: x0212
 Alias: iso-ir-159
 Alias: csISO159JISX02121990
 
 On Fri, 23 Oct 2009, NARUSE, Yui wrote:
 Where is the word JIS-X-0208 ?
 Where is the word JIS-X-0212 ?
 
 The exact string isn't there, that's why I included the preferred MIME 
 names in brackets in the spec.

if it is talking about character encodings,
why it uses the name of character sets mainly?
Following seems better.

 Authors should not use JIS_C6226-1983, JIS_X0212-1990,
 encodings based on ISO-2022, and encodings based 

 On Fri, 23 Oct 2009, NARUSE, Yui wrote:
 Second, JIS_C6226-1983, JIS_X0212-1990, and EBCDICs are not
 ASCII compatible. So they are out of discouraged; mustn't use.
 
 You can use non-ASCII-compatible encodings (e.g. UTF-16).

I see.

-- 
NARUSE, Yui  nar...@airemix.jp


Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, NARUSE, Yui wrote:
  
  The exact string isn't there, that's why I included the preferred MIME 
  names in brackets in the spec.
 
 if it is talking about character encodings,
 why it uses the name of character sets mainly?
 Following seems better.

  Authors should not use JIS_C6226-1983, JIS_X0212-1990,
  encodings based on ISO-2022, and encodings based 

Ok, done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


[whatwg] HTMLElement.onload

2009-10-23 Thread Markus Ernst
In 6.5.6.2 of the spec I found, that the onload event handler is now 
available for every HTML element in HTML5, which I think is a great 
improvement. But there is something on the load event, that I think 
would be worth some words to clarify.


According to 6.11.2 the load event is fired when the whole document is 
loaded; I did not find anything about element-specific load events. So I 
assume that element1.onload is triggered by the same event as 
element2.onload - the following two bodies would be equivalent:


body
  p onload=dosomething(this)Text/p
  p onload=dosomethingelse(this)Text/p
/body

body onload=dosomething(document.getElementById('foo'));
  dosomethingelse(document.getElementById('bar'))
  p id=fooText/p
  p id=barText/p
/body

Is this assumption correct?

Generally, the list of events that must be supported by all HTML 
elements looks somehow confusing to me, as there are some events that 
only apply to special types of elements, such as media players or forms 
resp. form elements. How are e.g. onpause or oninput supposed to work if 
applied to span or p elements?


[whatwg] Typo in Annotations for assistive technology products (ARIA) section

2009-10-23 Thread Mark Pilgrim
th elemen that is neither a column header nor a row header

should read

th element that is neither a column header nor a row header

-Mark


[whatwg] Another typo in Annotations for assistive technology products (ARIA) section

2009-10-23 Thread Mark Pilgrim
Either a button element or an input element is required to when using
the button role

should read

Either a button element or an input element is required when using
the button role

-Mark


Re: [whatwg] postMessage: max length / size

2009-10-23 Thread Drew Wilson
As a data point, the WebKit implementation (used by Safari and Chrome)
doesn't currently enforce any limits (other than those imposed by running
out of memory).
-atw

On Fri, Oct 23, 2009 at 12:02 AM, Ian Hickson i...@hixie.ch wrote:

 On Thu, 22 Oct 2009, Brian Kuhn wrote:
 
  Is there any limit to the length of message you can send with
  postMessage (HTML5 Cross-document messaging)?
 
  I didn't see anything in the spec about this.  I thought this might be
  one area where implementations might end up differing.

 There are probably implementation-specific limits, but HTML tries to not
 say what the limits should be, since it's hard to know what they should
 be. It might vary from platform to platform and device to device, and will
 almost certainly vary over time.

 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'



Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Øistein E . Andersen

On 23 Oct 2009, at 04:20, Ian Hickson wrote:


On Wed, 21 Oct 2009, Øistein E. Andersen wrote:





ASCII-compatibility:
The note in ‘2.1.5 Character encodings’ seems to say that [...]
ISO-2022’[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and  
I cannot

find anything in Section 2.1.5 that would explain this difference.


HZ-GB-2312 uses the byte ASCII uses for ~ as the escape character.
ISO-2022-* uses the control codes. That's the difference.


'~'/0x7E is not (and should not be, as far as I can tell) relevant for  
HTML5's concept of ASCII compatibility.



Discouraged encodings: [...]


Authors should not use JIS-X-0208 (JIS_C6226-1983), JIS-X-0212
(JIS_X0212-1990), [...]


It is not clear what this means [...]


This is talking about character encodings, not character sets.
JIS_C6226-1983 is a registered character encoding in the IANA  
registry.


(This is less confusing now since HTML5 only deals with character  
encodings and the strings match those in the the IANA registry as  
suggested by Yui Naruse.)



the list of discouraged encodings seems conspicuously short if it is
supposed to be complete; and the lack of rationale makes it  
difficult to

understand why these encodings are considered particularly harmful
(JIS_C6226-1983 v. JIS_C6226-1978 or ISO-2022 v. HZ, to mention but  
two

at least initially puzzling cases).


The reason for including these is to discourage encodings known to  
have
security issues. I've added HZ-GB-2312, which can be used in a  
similarly
dangerous fashion. (Basically the danger for user agents is in an  
attacker

using an encoding that a user agent could autodetect, while a site
interprets the bytes safely; that would allow those encodings to be  
used

to smuggle script elements in a way that a naive whitelisting filter
would think is safe.)

It might be better to say *why* particular encodings are better  
avoided,

whether or not the list of discouraged encodings be presented as
definitive.


I've added a note.

[...]

On Thu, 22 Oct 2009, Philip Taylor wrote:


The string [숍訊昱穿] encoded as ISO-2022-KR is the bytes 0e  
3c 73
63 72 69 70 74 3e. A UA that doesn't support ISO-2022-KR (e.g.  
Chrome,
when I last checked) will decode it as Windows-1252 and get the  
string
script, which is bad. So a site that uses ISO-2022-KR is very  
likely
to expose some users to XSS attacks, which seems like a good reason  
to
discourage that encoding. The same applies to other ISO-2022  
encodings.


[...]

On Thu, 22 Oct 2009, Øistein E. Andersen wrote:


If that is the reason, at least HZ encoding would seem to be  
affected as

well. Explicitly discouraging a more or less random subset of the
problematic encdodings without providing rationale makes it  
difficult to

assess whether or not other, somewhat similar, encodings should be
avoided as well, which was the main issue I wanted to raise.


Hopefully this is somewhat addressed now.



The added note certainly helps, but it is vague (does [m]ost of these  
encodings mean all the encodings mentioned above apart from  
UTF-32?) and inaccurate (Philip Taylor's example does not rely on  
bugs).


Given that the set of encodings is open-ended, I still think it would  
be preferable to make the rationale (a definition of what makes an  
encoding problematic) primary and mention actual encodings as  
examples. This could give something like the following: Encodings in  
which a series of bytes in the range 0x20..0x7E may encode characters  
other than the corresponding characters in the range U+20..U+7E  
represent a potential security vulnerability since a browser that does  
not support the encoding (or does not support the label used to  
declare the encoding, or does not use the same mechanism to detect the  
encoding of unlabelled content) might end up interpreting technically  
benign plain text content as HTML tags and JavaScript.  In particular,  
this applies to encodings in which the bytes corresponding to  
'script' in ASCII may encode a different string. Authors should not  
use such encodings, which are known to include  In addition,  
authors should not use UTF-32  Alternatively, fixing the current  
note would help and might be sufficient, albeit not ideal.


I think one has to realise that a comprehensive list of problematic  
encodings is an elusive goal and act accordingly.


--
Øistein E. Andersen


PS: The following sentence makes little sense without (curly) quotes  
and apostrophes. In case they disappeared before you read it, please  
find it repeated below with (ASCII) quotes and apostrophes:


It should probably be ‘advise against authors'’ using legacy  
encodings

or better ‘advise authors against using legacy encodings’.


(The current text in the spec is fine.)

Re: [whatwg] Typo in Annotations for assistive technology products (ARIA) section

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, Mark Pilgrim wrote:

 th elemen that is neither a column header nor a row header
 
 should read
 
 th element that is neither a column header nor a row header

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Another typo in Annotations for assistive technology products (ARIA) section

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, Mark Pilgrim wrote:

 Either a button element or an input element is required to when using 
 the button role
 
 should read
 
 Either a button element or an input element is required when using the 
 button role

Fixed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2009-10-23 Thread Ian Hickson
On Fri, 23 Oct 2009, �istein E. Andersen wrote:
 On 23 Oct 2009, at 04:20, Ian Hickson wrote:
  On Wed, 21 Oct 2009, Øistein E. Andersen wrote:
  
   ASCII-compatibility:
   The note in ‘2.1.5 Character encodings’ seems to say that [...]
   ISO-2022’[-*] are ASCII-compatible, whereas HZ-GB-2312 is not, and I
   cannot
   find anything in Section 2.1.5 that would explain this difference.
  
  HZ-GB-2312 uses the byte ASCII uses for ~ as the escape character.
  ISO-2022-* uses the control codes. That's the difference.
 
 '~'/0x7E is not (and should not be, as far as I can tell) relevant for HTML5's
 concept of ASCII compatibility.

Good point. Moved the encoding over to the other side.


 The added note certainly helps, but it is vague (does [m]ost of these 
 encodings mean all the encodings mentioned above apart from UTF-32?) 
 and inaccurate (Philip Taylor's example does not rely on bugs).
 
 Given that the set of encodings is open-ended, I still think it would be 
 preferable to make the rationale (a definition of what makes an encoding 
 problematic) primary and mention actual encodings as examples. This 
 could give something like the following: Encodings in which a series of 
 bytes in the range 0x20..0x7E may encode characters other than the 
 corresponding characters in the range U+20..U+7E represent a potential 
 security vulnerability since a browser that does not support the 
 encoding (or does not support the label used to declare the encoding, or 
 does not use the same mechanism to detect the encoding of unlabelled 
 content) might end up interpreting technically benign plain text content 
 as HTML tags and JavaScript.  In particular, this applies to encodings 
 in which the bytes corresponding to 'script' in ASCII may encode a 
 different string. Authors should not use such encodings, which are known 
 to include  In addition, authors should not use UTF-32  
 Alternatively, fixing the current note would help and might be 
 sufficient, albeit not ideal.

I've reworded the spec based on your suggestion. Thanks!

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'