https://bugzilla.wikimedia.org/show_bug.cgi?id=47632
Web browser: ---
Bug ID: 47632
Summary: Use case sensitive collation in wb_items_by_site
Product: MediaWiki extensions
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: WikidataRepo
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
In the wb_items_by_site, the ips_page_title field is declared to be a VARCHAR.
Per default, MySQL will apply case insensitive collation to fields with that
type, removing the distinction between Foo, FOO and FoO. That distinction
however is quite important, we might have distinct links to all of these.
Note that this doesn't happen when setting up mediawiki in "binary" database
mode, since then VARCHAR gets changed to VARBINARY automatically. But this
should still work correctly for people using utf-8 mode. So:
We can either declare this field to use binary collation, like we do for
term_text in the wb_terms table: ips_page_title VARCHAR(255) BINARY NOT NULL.
Or we could declare it to use case *sensitive* UTF-8 collation: ips_page_title
VARCHAR(255) COLLATE utf8_unicode_520_ci NOT NULL.
However, it must be tested how well schema conversion works with these, for the
different MySQL modes as well as for SQLite, PostGreSQL, etc
--
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l