[Bug 21429] Arabic double diacritics presentation

2013-08-24 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

Stephen G. Brown sgb-wob...@sbcglobal.net changed:

   What|Removed |Added

 CC||sgb-wob...@sbcglobal.net

--- Comment #6 from Stephen G. Brown sgb-wob...@sbcglobal.net ---
This bug was first reported in Bug 2399 - Unicode normalization sorts
Hebrew/Arabic/Myanmar vowels wrongly.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2012-04-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

Krinkle krinklem...@gmail.com changed:

   What|Removed |Added

Version|1.16|1.16.x

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2011-09-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

Siebrand s.mazel...@xs4all.nl changed:

   What|Removed |Added

 AssignedTo|wikibugs-l@lists.wikimedia. |amir.ahar...@mail.huji.ac.i
   |org |l

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2011-01-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

Siebrand s.mazel...@xs4all.nl changed:

   What|Removed |Added

   Keywords||i18n
 CC||s.mazel...@xs4all.nl

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2011-01-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

--- Comment #5 from Philippe Verdy verd...@wanadoo.fr 2011-01-31 21:17:33 UTC 
---
OK, but bug 9413 just spoke about the presentational forms of letters (i.e. the
distinction of *letters* between initial, media, final, and isolated). The
Shadda is not a letter and may be inserted at any place within a word as a
presentational feature. As it is presentational, changing it by the
compatibility mapping will change exactly its presentational semantic.

If the purpose was to convey a single meaning, it should have been stripped
completely. When U+FC61 appears, it is used in isolation where its expected
width and appearance is important. Changing it will alter its width, and the
KASRA may not fit very well.

But may be the font renderers are now capable of handling it and generating
exactly what U+FC61 displays when it is mapped in a font (but such mapping is
not necessary in any Arabic font, even if those fonts are most often adding
those mappings).

I'm not sure this is a big issue. What is the problem if we cannot see the
difference, except when editing where you'll type BACKSPACE twice instead of
once to delete it completely in insert mode (but no difference when you select
if with the mouse).

The only cases where it could make some difference is when U+FC61 is followed
by another Arabic diacritic (due to canonical reordering after the
compatibility decomposition has been applied. This does not change the BiDi
behavior and joining behavior, even if there are spaces or punctuations on both
sides.

If it ever appears in the middle of a word, however, this will change its
appearance, because the decomposition and the joining type will alter its form.
I doubt that such cases are existing in normal Arabic. This could be an issue
in IDNA domain names, if this compatibility character was not mandatorily
mapped to the normal shadda+diacritic (just like other Arabic compatibility
presentational forms), but it should merit some investigation to check that
this is effecgtively the case with the newer IDNA RFCs and Unicode papers about
IDNA (which has relaxed some rules to allow more characters that were restrited
before).

But if this causes any problem in a URL inserted as the target of an external
link, one could still use the xn-- notation in the hidden URL. But I also
have serious doubts that such an URL with compatibility URLs would be harmless
(most probably in a cybersquatting domain), where instead it could be valid and
distinct within the URL query string part, or anchor part, or path part, for
example as a link to a site detailing the Unicode properties of this
compatibility character ; but may be there's a way to still encode the URL
specially).

Anyway, all those Arabic compatibility characters are really not recommanded
within any part of a stable URL, and are also no longer generated by Arabic
keyboards in any decent browser since long (and they are most probably detected
in browsers or security suites as dangerous if ever found in an URL, where the
brower or its security extension will propose to the user to follow the link
with the normal characters, or cancel the navigation and come back, or confirm
that the user really wants to go there after he's been warned, notably if they
appear in the domain name part, in some IDNA-enabled registry or private
subregistry that does not implement a restriction on those characters in their
DNS records).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2010-06-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429

Tim Starling tstarl...@wikimedia.org changed:

   What|Removed |Added

 CC||tstarl...@wikimedia.org

--- Comment #4 from Tim Starling tstarl...@wikimedia.org 2010-06-08 04:11:20 
UTC ---
Normalisation of the Arabic presentation forms was requested by members of the
Arabic Wikipedia community. I recorded the request at bug 9413 and later
implemented it.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2009-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429


Philippe Verdy verd...@wanadoo.fr changed:

   What|Removed |Added

 CC||verd...@wanadoo.fr




--- Comment #3 from Philippe Verdy verd...@wanadoo.fr  2009-11-19 19:48:24 
UTC ---
Isn't the U+FC61 a compatibility character whose normalization excludes
decomposition and recombinations under NFD/NFC canonical equivalences?

If some Arabic fonts do not support two successive diacritcs as recommended by
Unicode, and only support the decomposable compatibility characters, these
fonts are really bogous and should be avoided. But the problem is not there,
see below.

If the character is not a canonical equivalent to the two diacritics, it must
not be altered (even if it's not recommended).
In other words, MediaWiki must just apply the NFC normalization, but NOT the
NFKC normalisation.

When I look at the UCD, it reveals that U+FC61 decomposes as [isolated] U+0020
U+064F U+0651

Which means that this is just a compatibility decomposition, and not a
canonical decomposition (note also that the decomposition adds an extra space,
which in newer documents should rather be a non-breaking space instead of a
regular space, to avoid side effects that are possible with whitespace
compressions in HTML and XML). Note also that the space still prohibits
reordering.

I see no reason then, why Mediawiki would choose to convert U+FC61 incorrectly
to U+064F U+0651 (stripping the [isolated] compatibility specifier and one
space).

And also no reason why it would recombine U+064F U+0651 (adding the leading
space and an inexistant [isolated] form) into U+FC61 in the editor.

The same reason should be applied to all the other Arabic compatibility
characters (with implicit letter forms) that should be avoided in actual arabic
text, unless there is a strong reason to display the character in isolation
with a specific form distinct from the normal Arabic presentation rules.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2009-11-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429


Niklas Laxström niklas.laxst...@gmail.com changed:

   What|Removed |Added

 CC||niklas.laxst...@gmail.com




--- Comment #1 from Niklas Laxström niklas.laxst...@gmail.com  2009-11-07 
19:03:09 UTC ---
What is the bug? All text is converted to some normalisation form.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21429] Arabic double diacritics presentation

2009-11-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21429





--- Comment #2 from Arif alf...@ymail.com  2009-11-08 07:16:53 UTC ---
Ups, sorry. I meant in the edit box. The result is fine, since both sequences
are converted to correct character. But not in the edit box. An example, I
wrote: ARABIC LETTER ALIF, ARABIC LETTER LAM, ARABIC LETTER HAH, ARABIC LETTER
REH, U+0651 ARABIC SHADDA, U+064F ARABIC DAMMA. In the edit box, the double
diacritics will be converted to U+FC61 ARABIC LIGATURE SHADDA WITH DAMMA
ISOLATED FORM. Whenever I click Save page or Show preview, the source
become: ARABIC LETTER ALIF, ARABIC LETTER LAM, ARABIC LETTER HAH, ARABIC LETTER
REH, U+064F, U+0651. This time, there's no U+FC61 character that I expected to
see.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l