On Sun, Dec 28, 2008 at 5:57 PM, Brion Vibber <[email protected]> wrote:
> If we're going to stick with strict ASCII-limited anchors, it might be
> worth considering making them more legible, say with transliteration to
> ASCII Latin chars. :P
>
> On the other hand, XHTML *doesn't* actually limit us this way!
>
> The XHTML 1.0 recommendation of restriction to [A-Za-z][A-Za-z0-9:_.-]*
> is for compatibility with HTML 4.0, which defines:
>
>   ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
>   followed by any number of letters, digits ([0-9]), hyphens ("-"),
>   underscores ("_"), colons (":"), and periods (".").
>
> XHTML specifcies ID and NMTOKEN types here, which are *not* restricted
> to ASCII, but rather a large number of scripts:
>
> http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-NameChar
>
> http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter
> http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Digit
> http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Extender

This sounds like an excellent idea.  I tried in IE5 (on ies4linux),
Firefox 3, and Opera 9.something and all had no problem with this
trivial test page:

http://www.twcenter.net/~simetrical/tests/unicode_anchor.html

The W3C validator is happy with it too.

Of course, we still *do* have to ensure that id's don't start with any
of the following:

"-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

The ones specified as Unicode code points are all either combining
characters or -- strangely -- the character ยท MIDDLE DOT.

There are also still a bunch of characters that aren't allowed in id's
period -- I'd assume stuff like whitespace, some punctuation, and
reserved characters, although I didn't look closely at the classes in
question.  And, of course, most ASCII punctuation is still not
allowed.  I guess we can keep up our dot-encoding for this -- although
if so, we should encode dots as well, because currently the encoding
is lossy, which is unnecessary.  (Actually, you'd have to fix the
"prepend x" solution too, that adds more lossiness.)
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to