Smalyshev added a comment.

I'd like to resolve this issue one way or another. What we have basically is (for page named "Category:Pretty flowers")

  • $siteLink->getPageName() returns article name as-is (Category:Pretty flowers)
  • Rdf generator now applies rawurlencode, producing: Category%3APretty%20flowers
  • Title->getCanonicalURL produces: Category:Pretty_flowers

Both Title and Rdf encode non-ASCII, but with spaces and some ASCII

Currently RDF algorithm either:

$baseUrl = str_replace( '$1', rawurlencode( $siteLink->getPageName() ), $site->getLinkPath() );
if ( !parse_url( $baseUrl, PHP_URL_SCHEME ) ) {
   $url = ""
} else {
   $url = ""
}

or:

global $wgArticlePath;
$url = "" '$1', rawurlencode( $title->getPrefixedText() ), $wgArticlePath );

Both do not match what Title is doing. This may be annoying, especially as you can't copy URL of pages and use them in SPARQL queries directly.

Should we change behavior to match what Title is doing? Can we do it reliably -
SiteLinksRdfBuilder may not have access to configs needed for actually using Title, but we could simulate it maybe? Should we maybe use wfUrlencode instead of rawurlencode?


TASK DETAIL
https://phabricator.wikimedia.org/T131960

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, JanZerebecki, aude, daniel, Christopher, Aklapper, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to