--- Comment #1 from Dave Ross <> 2011-02-05 15:41:31 UTC 
The discussion can be seen here, but here are the diacritics and characters
provided to me:

First of all, the pairs with nuqta (a dot underneath) and without it should be
searchable the same way Roman letters with diacritics and without are
    * क़/क ख़/ख ग़/ग ज़/ज फ़/फ ड़/ड ढ़/ढ 
The letters are not identical but So that if a user typed खून, ख़ून would also
be listed.
    * Words containing diacritics ॉ (candra), ् (virama) should be equal to
those without them: चॉकलेट / चाकलेट, सन् / सन. Similar to the way English words
entries with a space are equal to those having a hyphen (-) between them. 
    * Different forms of alif: ا, أ‎, إ‎, ﺁ‎ and ٱ‎‎ should be searchable
together, e.g. أمس and امس, etc.
    * Words containing any of these diacritics could be searchable as if they
don't have them and the other way around: 
ـَ fatHa, ـِ kasra, ـُ Damma, ـْ sukuun, ـّ shadda, ـٰ dagger 'alif. 
    * ـٌ tanwiin al-Damm (تنوين الضم) 
    * ـٍ tanwiin al-kasr (تنوين الكسر) 
    * ـً tanwiin al-fatH (تنوين الفتح) 
Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکی‌پدیا. People
who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a
misspelling, but lots of people can’t help it.

In languages like Khmer and Thai that do not use word spaces, there is often a
zero-width space (& # x200B;) as in តើអ្នកនិយាយ​ភាសាអង់គ្លេស​ទេ. More often
than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are

I think Anatoli neglected to mention the word-final Arabic pair ه/ة. The final
letter ة may be typed as ه.

Configure bugmail:
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Wikibugs-l mailing list

Reply via email to