thiemowmde created this task.
thiemowmde added projects: Wikidata, MediaWiki-extensions-WikibaseRepository, DataValues.
Restricted Application added a subscriber: Aklapper.

TASK DESCRIPTION

commonsMedia values are stored as strings that contain the file name only, e.g. Example_en.svg refers to https://commons.wikimedia.org/wiki/File:Example_en.svg. There is a validator in place that checks if the file name is valid and exists on Commons. But there is no normalization/parsing except for whitespace trimming. This means all the following can exist side by side, while all referring to the same file on Commons:

  • Example_en.svg
  • Example en.svg
  • example en.svg

This is a problem in all situations where one specific form of a page title is expected, e.g. with spaces for human-readable labels, but with underscores for links. E.g. the issue T99664: [Bug] Diff does not show stored capitalisation of first letter would not have happened with normalization in place.

Proposal:

  1. Decide which form should be in the database. (Personally, I suggest to store the human readable form Example en.svg with spaces and the first character capitalized because this is what people see and expect the most.)
  2. Implement a parser that applies this to all new and edited values.
  3. Optionally walk through all existing values and normalize them accordingly.

TASK DETAIL
https://phabricator.wikimedia.org/T204723

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: thiemowmde
Cc: hoo, WMDE-leszek, Jonas, Lydia_Pintscher, Addshore, thiemowmde, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to