https://bugzilla.wikimedia.org/show_bug.cgi?id=47632

       Web browser: ---
            Bug ID: 47632
           Summary: Use case sensitive collation in wb_items_by_site
           Product: MediaWiki extensions
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: WikidataRepo
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

In the wb_items_by_site, the ips_page_title field is declared to be a VARCHAR.
Per default, MySQL will apply case insensitive collation to fields with that
type, removing the distinction between Foo, FOO and FoO. That distinction
however is quite important, we might have distinct links to all of these.

Note that this doesn't happen when setting up mediawiki in "binary" database
mode, since then VARCHAR gets changed to VARBINARY automatically. But this
should still work correctly for people using utf-8 mode. So:

We can either declare this field to use binary collation, like we do for
term_text in the wb_terms table: ips_page_title VARCHAR(255) BINARY NOT NULL.
Or we could declare it to use case *sensitive* UTF-8 collation:  ips_page_title
VARCHAR(255) COLLATE utf8_unicode_520_ci NOT NULL.

However, it must be tested how well schema conversion works with these, for the
different MySQL modes as well as for SQLite, PostGreSQL, etc

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to