https://bugzilla.wikimedia.org/show_bug.cgi?id=40821

       Web browser: ---
             Bug #: 40821
           Summary: PostgreSQL searches do not treat Unicode full width
                    characters as their normal counterparts
           Product: MediaWiki
           Version: 1.21-git
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Unprioritized
         Component: Search
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified
   Mobile Platform: ---


The search engines for MySQL and SQLite treat "AZ" (that's #xff21 and #xff3a)
as "AZ" (cf. [[Halfwidth and fullwidth forms]]), PostgreSQL does not and thus
fails testFullWidth().

One idea would be to TRANSLATE() them in ts2_page_text() and ts2_page_title()
and use a similar technique in SearchPostgres::parseQuery().  If so, we need to
describe in the release notes how to regenerate the tsvectors after an update
or detect if ts2_page_text() or ts2_page_title() has changed and then
regenerate them ourselves (I prefer the former).

Of course, another imaginable approach would be try to push this normalization
into a text search configuration for to_tsvector(), but I don't know whether
this is even possible.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to