https://bugzilla.wikimedia.org/show_bug.cgi?id=53754

Siddhartha Ghai <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]
            Summary|Marathi/Devanagari:Backspac |VisualEditor: Devanagari:
                   |e deletes combined          |Backspace deletes combined
                   |character clusters together |character clusters together
                   |with anuswara diacritics    |with diacritics

--- Comment #1 from Siddhartha Ghai <[email protected]> ---
Per Bug 51472#c4 , the grapheme cluster handling for backspace is to be on a
per script basis. So, this should be treated as the bug for specifically
devanagari.

Also note that I am confirming the bug for hindi.

To further clarify the original report, devanagari has various diacritics which
can be applied to base unicode characters. It also has a combining character
halant (viram) ् (U+094D).

Currently, pressing backspace after a grapheme cluster containing one or more
base characters with one or more diacritics and/or combining character deletes
the entire grapheme cluster. This is not desired behaviour. Pressing delete
before a cluster deletes the entire cluster. This is desired behaviour.

Examples of diacritics: ँ (Chandrabindu) U+0901 ं (Bindu) U+0902 etc.

Examples of grapheme clusters:

One base character with one diacritic: कं ( क + ं ), कँ ( क + ँ ), कः ( क + ः )

One base character with multiple diacritics: किं ( क + ि + ं )

Multiple base characters with halant: श्र ( श + ् + र ), क्ष ( क + ् + ष ), प्र
( प + ् + र )

Multiple base characters with halant followed by diacritics: श्रिं (श + ् + र +
ि + ं), क्षि ( क + ् + ष + ि ), प्रे ( प + ् + र + े )

System environment:
Win7 X64
Google Chrome 29.0.1547.62 m
Page used for testing: [[:w:hi:User:Siddhartha Ghai/sandbox]]

Expected behaviour:
Only one diacritic (the last one in the grapheme), ie one unicode character, is
to be deleted. The rest of the grapheme cluster is to stay intact.

Examples used (not exhaustive):
Grapheme -> Grapheme after pressing backspace
कं -> क
कँ -> क
कः -> क
क् -> क
किं -> कि
श्र -> श्
क्ष -> क्
प्र -> प्
श्रिं -> श्रि
क्षि -> क्ष
प्रे -> प्र

Current behaviour (blank indicates entire grapheme cluster was removed) (these
results should be verified on other browser/OS combinations):
कं -> 
कँ -> 
कः -> 
क् -> 
किं ->
श्र -> श्  (Working correctly)
क्ष -> क् (Working correctly)
प्र -> प् (Working correctly)
श्रिं -> श् (Deletes र + ि + ं , ie three unicode characters instead of one)
क्षि -> क् (Deletes ष + ि , ie two unicode characters instead of one)
प्रे -> प् (Deletes र + े , ie two unicode characters instead of one)

Points to note:
Some IMEs may provide non-normalized input for characters such as फ़ (U+095E)
in place of फ (U+092B) + ़ (U+093C), ढ़ (U+095D) in place of ढ (U+0922) + ़
(U+093C) etc. In such cases, the user may expect that pressing a backspace will
only eliminate the diacritic, not the entire grapheme. So, VE may have to
handle normalization in such cases.

Results seem to indicate that halant is partially correctly handled. letter +
halant + letter + backspace gives letter + halant correctly. But
letter + halant + backspace, instead of giving the letter, deletes the entire
grapheme.

The remaining diacritics as of unicode 3.0 come under Nonspacing mark (Mn) and
Spacing combining mark (Mc) (Note: This does not include devanagari extended
added in unicode 6.0 and vedic extensions added in unicode 6.1)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to