https://bugzilla.wikimedia.org/show_bug.cgi?id=27055

           Summary: Devanagari and Arabic combining character handling
           Product: MediaWiki extensions
           Version: any
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: MWSearch
        AssignedTo: rain...@eunet.rs
        ReportedBy: tstarl...@wikimedia.org


User:Atitarev from Wiktionary has complained that the normalisation used by
Lucene does not suit Hindi and Arabic. In the examples I have been given,
composing characters such as U+093C are used add diacritics to characters, and
the resulting combinations have no composed form in Unicode. It is requested
that the composing marks be stripped before search indexing is done, so that
titles which differ only by the combining marks they contain can be returned in
"did you mean" and autocomplete results.

A list of affected characters will be given as a comment or attachment.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to