[Breaking this thread off...]

On 12/28/08 1:32 AM, Niklas Laxström wrote:
> The anchors of non-latin headers are already (latin) gibberish:
> #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F
>
> It doesn't seem reasonable to think that people could create anchors
> in their head from text, except in special cases.

If we're going to stick with strict ASCII-limited anchors, it might be 
worth considering making them more legible, say with transliteration to 
ASCII Latin chars. :P

On the other hand, XHTML *doesn't* actually limit us this way!

The XHTML 1.0 recommendation of restriction to [A-Za-z][A-Za-z0-9:_.-]* 
is for compatibility with HTML 4.0, which defines:

   ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
   followed by any number of letters, digits ([0-9]), hyphens ("-"),
   underscores ("_"), colons (":"), and periods (".").

XHTML specifcies ID and NMTOKEN types here, which are *not* restricted 
to ASCII, but rather a large number of scripts:

http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-NameChar

http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Digit
http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Extender

If there are no major browser compatibility problems, I would probably 
recommend we roll back the nasty old .XX encoding for HTML 4 
compatibility, in which case we could quite legally produce something 
direct, such as:

http://ru.wikipedia.org/wiki/Уплисцихе#Уплисцихе_в_средневековье

which URL-encodes out to:

http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%85%D0%B5#%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%85%D0%B5_%D0%B2_%D1%81%D1%80%D0%B5%D0%B4%D0%BD%D0%B5%D0%B2%D0%B5%D0%BA%D0%BE%D0%B2%D1%8C%D0%B5

(which can be nicely displayed as pretty Unicode in the URL bar of 
modern browsers)

as opposed to the current:

http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%85%D0%B5#.D0.A3.D0.BF.D0.BB.D0.B8.D1.81.D1.86.D0.B8.D1.85.D0.B5_.D0.B2_.D1.81.D1.80.D0.B5.D0.B4.D0.BD.D0.B5.D0.B2.D0.B5.D0.BA.D0.BE.D0.B2.D1.8C.D0.B5

-- brion

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to