thiemowmde created this task. thiemowmde added projects: Wikidata, MediaWiki-extensions-WikibaseRepository, DataValues. Restricted Application added a subscriber: Aklapper. |
commonsMedia values are stored as strings that contain the file name only, e.g. Example_en.svg refers to https://commons.wikimedia.org/wiki/File:Example_en.svg. There is a validator in place that checks if the file name is valid and exists on Commons. But there is no normalization/parsing except for whitespace trimming. This means all the following can exist side by side, while all referring to the same file on Commons:
- Example_en.svg
- Example en.svg
- example en.svg
This is a problem in all situations where one specific form of a page title is expected, e.g. with spaces for human-readable labels, but with underscores for links. E.g. the issue T99664: [Bug] Diff does not show stored capitalisation of first letter would not have happened with normalization in place.
Proposal:
- Decide which form should be in the database. (Personally, I suggest to store the human readable form Example en.svg with spaces and the first character capitalized because this is what people see and expect the most.)
- Implement a parser that applies this to all new and edited values.
- Optionally walk through all existing values and normalize them accordingly.
Cc: hoo, WMDE-leszek, Jonas, Lydia_Pintscher, Addshore, thiemowmde, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs