https://bugzilla.wikimedia.org/show_bug.cgi?id=40821
Web browser: ---
Bug #: 40821
Summary: PostgreSQL searches do not treat Unicode full width
characters as their normal counterparts
Product: MediaWiki
Version: 1.21-git
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Unprioritized
Component: Search
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Mobile Platform: ---
The search engines for MySQL and SQLite treat "AZ" (that's #xff21 and #xff3a)
as "AZ" (cf. [[Halfwidth and fullwidth forms]]), PostgreSQL does not and thus
fails testFullWidth().
One idea would be to TRANSLATE() them in ts2_page_text() and ts2_page_title()
and use a similar technique in SearchPostgres::parseQuery(). If so, we need to
describe in the release notes how to regenerate the tsvectors after an update
or detect if ts2_page_text() or ts2_page_title() has changed and then
regenerate them ourselves (I prefer the former).
Of course, another imaginable approach would be try to push this normalization
into a text search configuration for to_tsvector(), but I don't know whether
this is even possible.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l