https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

--- Comment #5 from Philippe Verdy <verd...@wanadoo.fr> 2011-01-31 21:17:33 UTC 
---
OK, but bug 9413 just spoke about the presentational forms of letters (i.e. the
distinction of *letters* between initial, media, final, and isolated). The
Shadda is not a letter and may be inserted at any place within a word as a
presentational feature. As it is presentational, changing it by the
compatibility mapping will change exactly its presentational semantic.

If the purpose was to convey a single meaning, it should have been stripped
completely. When U+FC61 appears, it is used in isolation where its expected
width and appearance is important. Changing it will alter its width, and the
KASRA may not fit very well.

But may be the font renderers are now capable of handling it and generating
exactly what U+FC61 displays when it is mapped in a font (but such mapping is
not necessary in any Arabic font, even if those fonts are most often adding
those mappings).

I'm not sure this is a big issue. What is the problem if we cannot see the
difference, except when editing where you'll type BACKSPACE twice instead of
once to delete it completely in insert mode (but no difference when you select
if with the mouse).

The only cases where it could make some difference is when U+FC61 is followed
by another Arabic diacritic (due to canonical reordering after the
compatibility decomposition has been applied. This does not change the BiDi
behavior and joining behavior, even if there are spaces or punctuations on both
sides.

If it ever appears in the middle of a word, however, this will change its
appearance, because the decomposition and the joining type will alter its form.
I doubt that such cases are existing in normal Arabic. This could be an issue
in IDNA domain names, if this compatibility character was not mandatorily
mapped to the normal shadda+diacritic (just like other Arabic compatibility
presentational forms), but it should merit some investigation to check that
this is effecgtively the case with the newer IDNA RFCs and Unicode papers about
IDNA (which has relaxed some rules to allow more characters that were restrited
before).

But if this causes any problem in a URL inserted as the target of an external
link, one could still use the "xn--" notation in the hidden URL. But I also
have serious doubts that such an URL with compatibility URLs would be harmless
(most probably in a cybersquatting domain), where instead it could be valid and
distinct within the URL query string part, or anchor part, or path part, for
example as a link to a site detailing the Unicode properties of this
compatibility character ; but may be there's a way to still encode the URL
specially).

Anyway, all those Arabic compatibility characters are really not recommanded
within any part of a stable URL, and are also no longer generated by Arabic
keyboards in any decent browser since long (and they are most probably detected
in browsers or security suites as dangerous if ever found in an URL, where the
brower or its security extension will propose to the user to follow the link
with the normal characters, or cancel the navigation and come back, or confirm
that the user really wants to go there after he's been warned, notably if they
appear in the domain name part, in some IDNA-enabled registry or private
subregistry that does not implement a restriction on those characters in their
DNS records).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to