--- Comment #15 from Philippe Verdy <> ---

* filter the fiven file name / page name / category name, containing HTML
  entities as returned by various parser functions
      (like lc:, uc:, #if:, #switch:...),
  through #titleparts to convert back these HTML entities to plain characters

* this returned value can be passed to parsers functions that do not like
  these HTML entities:

The HTML entities we need to handle are notably those characters:

  ' " &

which are valid in page names (the < and > characters are not valid in
pagenames, they will remain encoded after calling #titleparts).

For details about te various encodings used in page names, see

  [[mw:Manual:PAGENAMEE encoding]]

which details how characters may get encoded.
This covers the full ASCII set, and the first printable non-ASCII characters
(tested with UTF-8 assumed for their plain-text encoding).
This also covers some other contextual changes that may occur for some
characters which are not encoded except in leading positions where they may be
changed, or dropped, as well as those few charaters that get transformed within
specific subsequences anywhere in the string (such as the slash and periods).

But I agree that functions like PAGESINCATEGORY, FILEPATH... should properly
decode these HTML entities (and notably the 3 characters above; the most
frequent one encountered being the ASCII apostrophe-quote).

You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
Wikibugs-l mailing list

Reply via email to