[whatwg] Tree construction: parse error and plaintext

2008-03-12 Thread Thomas Broyer
In the in body insertion mode, shouldn't the eod-of-file token
case have a special handling of if the current node is a plaintext
element and not generate a parse error in this case?

The current behavior is that if you use plaintext, you'll have a
parse error at EOF. Is this intended?

-- 
Thomas Broyer


Re: [whatwg] Tree construction: parse error and plaintext

2008-03-12 Thread Anne van Kesteren
On Wed, 12 Mar 2008 16:53:52 +0100, Thomas Broyer [EMAIL PROTECTED]  
wrote:

In the in body insertion mode, shouldn't the eod-of-file token
case have a special handling of if the current node is a plaintext
element and not generate a parse error in this case?

The current behavior is that if you use plaintext, you'll have a
parse error at EOF. Is this intended?


Yes:  
http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2007-January/009113.html



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/


Re: [whatwg] HTML 5 - comments on 5.6 Command APIs

2008-03-12 Thread Krzysztof Żelechowski

Dnia 11-03-2008, Wt o godzinie 19:31 +, Tom Gilder pisze:
 On Tue, Mar 11, 2008 at 6:15 PM, Krzysztof Żelechowski
 [EMAIL PROTECTED] wrote:
   I see no point in returning true when there are no links to remove. IE
   and Opera currently only return true when the selection contains a
   link. WebKit follows the current HTML 5 wording.
 
  Unlink means Remove all links.
  There is no point removing all links when there are none
  but there is no harm either.
  Me thinks Unlink should be enabled in this case.
 
 queryCommandEnabled is primarily, I would argue, used to update UIs
 (especially enabling/disabling toolbar buttons) to show whether the
 command will currently have any affect on the document.
 
 There is indeed never any harm in calling execCommand('Unlink'), you
 can call it as much as you like at any point without raising an
 exception, but execCommandEnabled is surely about whether the call
 will actually achieve anything.
 
 Following your logic, queryCommandEnabled('Undo') could always return
 true, because there's no harm in trying to undo even when there's
 nothing to undo.

I have mixed feelings about this.  
I admit there is no practical harm; however,
Undo means undo the latest action 
and it is an error to say take your hat off 
to somebody who does not wear one.
I resembles popping from an empty stack; 
this action usually throws.
On the other hand, 
it is not an error to say Give me all your money 
to somebody who does not have any.

 
 execCommandEnabled is pointless unless it actually returns a useful
 value as to if it's going to do something.

I would read it more literally: it is enabled when it can be executed.  
Your version should be called execCommandEffective.

Chris



[whatwg] Superset encodings [Re: ISO-8859-* and the C1 control range]

2008-03-12 Thread Øistein E . Andersen
On 5th June 2007, Øistein E. Andersen wrote:

 (To do this properly, what we really ought to do is look for
 C1 and undefined characters in all IANA charsets and semi-official
 mappings to Unicode and check 1) whether the gaps can be filled
 by borrowing from other encodings, and 2) whether browsers
 actually do so. [...])

I have finally got round to looking at superset encodings.

To do this, I started with Unicode mappings from [UNI] for 8-bit 1-byte
alphabet encodings and added mappings for other such encodings
implemented in Opera, Safari or Firefox, mostly from [CSETS], though
I made one for Windows-Sami-2 from a PDF.  (I then discovered that IE
had something called Arabic-ASMO, for which no matching specification 
could be found, and subsequently reverse-engineered all IE's encodings.
Most of these turned out to be identical to other mappings or only
add characters from the PUA, but some real differences were found,
and those are reported in the text below.)

[UNI] http://unicode.org/Public/MAPPINGS/
[CSETS] http://crl.nmsu.edu/~mleisher/csets.html

All the character repertoires and encoding vectors defined by the mappings
were then compared pairwise. (Codepoints mapped to C0, space, BS or C1
were treated as unassigned, and directionality indicators for Arabic and
Hebrew were ignored.) The result is quite a big and unreadable table
[FULL], so the repertoires and encodings were clustered, which gave rise to
the tables in [ENC], which compare charsets with less than 27 incompatible
codepoints, as well as those in [REP], which compare charsets with at most
60 characters not found in both repertoires. (The thresholds are arbitrary, but 
more than sufficiently large to assure that all related charsets will be
clustered together and at the sime time sufficiently small to keep the
tables at a reasonable size.)

[FULL] http://coq.no/X/charset-table.html
[ENC] http://coq.no/X/charset-enc.html
[REP] http://coq.no/X/charset-rep.html

A short summary of the most interesting/relevant results (supported by [ENC])
can be found below.

-- 
Øistein E. Andersen

PS: How should colour be added to tables like these in HTML5 with
neither of the attributes bgcolor and style?

PPS: Some right-to-left characters contaminate surrounding characters as I
 have not yet found a simple solution to make everything strictly
 left-to-right (probably because I have not looked for it properly).




Notation


x  y:  x is a proper subset of y


=
ASCII
=

Most of the charsets are ASCII-compatible; some are EBCDIC-based
(none of which are implemented in browsers, as far as I know).

The following are /almost/ ASCII-compatible:

CP864 uses Arabic per cent in place of of the Latin sign.

JIS-201 replaces `reverse solidus' and `tilde' with `yen' and `macron'.

See below for PostScript / NextStep.



==
Arabic, including MacArabic / MacFarsi
==

Both MacArabic and MacFarsi are close to being supersets of 8859-6.

The Macintosh encodings encode explicitly right-to-left characters `dollar'
`space' and `hyphen' in place of ISO's `generic currency sign', `non-
breaking space' and `soft hyphen'.

MS IE's so-called ASMO-708 (not treated as an 8859-6 alias as per IANA)
appears to be another rough superset of 8859-6, adding accented lowercase
letters for French and box-drawing characters, but apparently soft hyphen
or non-breaking space.

MS IE also includes Arabic-DOS, which appears to be different from all
other encodings.

Note: Similarly, IE apparently handles CS-ISO-2022-JP as distinct from
  ISO-2022-JP. This is something to keep in mind when looking at
  multi-byte encodings.



==
Baltic Rim
==

Despite what Wikipedia says, 8859-13 and CP1257 are not actually compatible;
the latter puts `acute accent' and `high dot' where the former has
`left double quotation mark' and `right single quotation mark'.




Cyrillic KOI


There are several KOI8-based encodings, all of which include the basic
Russian modern alphabet (except yo) in an ASCII-compatible sequence.

KOI8-unified is almost a superset of ISO-IR-111, but uppercase and
lowercase Ukrainian `Cyrillic g with upturn' replace `generic currency
sign' and `soft hyphen'.

IE's KOI-8-U is different as it includes short uppercase and lowercase
y instead of two box-drawing characters.

Comments:  KOI8-RU (as opposed to KOI8-R and KOI8-U) is apparently obsolete
   and best forgotten.

   KOI8-unified shows all letters from any KOI8-based encoding
   correctly.  This one therefore seems like the best choice
   if distributional analysis indicates KOI-8 of some description.



Georgian


GEO-STD-8 and GEO-PS are mostly compatible, except that the former has
`No' where the latter has `y acute'.

(GEO-STD-8 is supposedly supported by Firefox, but does not seem to work

Re: [whatwg] Tree construction: parse error and plaintext

2008-03-12 Thread Ian Hickson
On Wed, 12 Mar 2008, Thomas Broyer wrote:

 In the in body insertion mode, shouldn't the eod-of-file token case 
 have a special handling of if the current node is a plaintext element 
 and not generate a parse error in this case?
 
 The current behavior is that if you use plaintext, you'll have a parse 
 error at EOF. Is this intended?

Yeah. plaintext is invalid anyway, and I didn't want to add more stuff 
in the spec just to make it report 1 error instead of 2.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'