https://bugzilla.wikimedia.org/show_bug.cgi?id=27055
Summary: Devanagari and Arabic combining character handling Product: MediaWiki extensions Version: any Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: Normal Component: MWSearch AssignedTo: rain...@eunet.rs ReportedBy: tstarl...@wikimedia.org User:Atitarev from Wiktionary has complained that the normalisation used by Lucene does not suit Hindi and Arabic. In the examples I have been given, composing characters such as U+093C are used add diacritics to characters, and the resulting combinations have no composed form in Unicode. It is requested that the composing marks be stripped before search indexing is done, so that titles which differ only by the combining marks they contain can be returned in "did you mean" and autocomplete results. A list of affected characters will be given as a comment or attachment. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l