[Bug 27055] Devanagari and Arabic combining character handling

bugzilla-daemon Sat, 05 Feb 2011 07:41:49 -0800

https://bugzilla.wikimedia.org/show_bug.cgi?id=27055


--- Comment #1 from Dave Ross <thedaver...@gmail.com> 2011-02-05 15:41:31 UTC 
---
The discussion can be seen here, but here are the diacritics and characters
provided to me:


Hindi:
First of all, the pairs with nuqta (a dot underneath) and without it should be
searchable the same way Roman letters with diacritics and without are
searchable.
    * क़/क ख़/ख ग़/ग ज़/ज फ़/फ ड़/ड ढ़/ढ 
The letters are not identical but So that if a user typed खून, ख़ून would also
be listed.
    * Words containing diacritics ॉ (candra), ् (virama) should be equal to
those without them: चॉकलेट / चाकलेट, सन् / सन. Similar to the way English words
entries with a space are equal to those having a hyphen (-) between them. 
----
Arabic:
    * Different forms of alif: ا, أ‎, إ‎, ﺁ‎ and ٱ‎‎ should be searchable
together, e.g. أمس and امس, etc.
    * Words containing any of these diacritics could be searchable as if they
don't have them and the other way around: 
ـَ fatHa, ـِ kasra, ـُ Damma, ـْ sukuun, ـّ shadda, ـٰ dagger 'alif. 
----
    * ـٌ tanwiin al-Damm (تنوين الضم) 
    * ـٍ tanwiin al-kasr (تنوين الكسر) 
    * ـً tanwiin al-fatH (تنوين الفتح) 
----
Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکی‌پدیا. People
who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a
misspelling, but lots of people can’t help it.

In languages like Khmer and Thai that do not use word spaces, there is often a
zero-width space (& # x200B;) as in តើអ្នកនិយាយភាសាអង់គ្លេសទេ. More often
than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are
correct.

I think Anatoli neglected to mention the word-final Arabic pair ه/ة. The final
letter ة may be typed as ه.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 27055] Devanagari and Arabic combining character handling

Reply via email to