https://bugzilla.wikimedia.org/show_bug.cgi?id=27055
--- Comment #1 from Dave Ross <thedaver...@gmail.com> 2011-02-05 15:41:31 UTC --- The discussion can be seen here, but here are the diacritics and characters provided to me: Hindi: First of all, the pairs with nuqta (a dot underneath) and without it should be searchable the same way Roman letters with diacritics and without are searchable. * क़/क ख़/ख ग़/ग ज़/ज फ़/फ ड़/ड ढ़/ढ The letters are not identical but So that if a user typed खून, ख़ून would also be listed. * Words containing diacritics ॉ (candra), ् (virama) should be equal to those without them: चॉकलेट / चाकलेट, सन् / सन. Similar to the way English words entries with a space are equal to those having a hyphen (-) between them. ---- Arabic: * Different forms of alif: ا, أ, إ, ﺁ and ٱ should be searchable together, e.g. أمس and امس, etc. * Words containing any of these diacritics could be searchable as if they don't have them and the other way around: ـَ fatHa, ـِ kasra, ـُ Damma, ـْ sukuun, ـّ shadda, ـٰ dagger 'alif. ---- * ـٌ tanwiin al-Damm (تنوين الضم) * ـٍ tanwiin al-kasr (تنوين الكسر) * ـً tanwiin al-fatH (تنوين الفتح) ---- Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکیپدیا. People who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a misspelling, but lots of people can’t help it. In languages like Khmer and Thai that do not use word spaces, there is often a zero-width space (& # x200B;) as in តើអ្នកនិយាយភាសាអង់គ្លេសទេ. More often than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are correct. I think Anatoli neglected to mention the word-final Arabic pair ه/ة. The final letter ة may be typed as ه. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l