Summary: Devanagari and Arabic combining character handling
Product: MediaWiki extensions
User:Atitarev from Wiktionary has complained that the normalisation used by
Lucene does not suit Hindi and Arabic. In the examples I have been given,
composing characters such as U+093C are used add diacritics to characters, and
the resulting combinations have no composed form in Unicode. It is requested
that the composing marks be stripped before search indexing is done, so that
titles which differ only by the combining marks they contain can be returned in
"did you mean" and autocomplete results.
A list of affected characters will be given as a comment or attachment.
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Wikibugs-l mailing list