| Smalyshev added a comment. |
I'd like to resolve this issue one way or another. What we have basically is (for page named "Category:Pretty flowers")
- $siteLink->getPageName() returns article name as-is (Category:Pretty flowers)
- Rdf generator now applies rawurlencode, producing: Category%3APretty%20flowers
- Title->getCanonicalURL produces: Category:Pretty_flowers
Both Title and Rdf encode non-ASCII, but with spaces and some ASCII
Currently RDF algorithm either:
$baseUrl = str_replace( '$1', rawurlencode( $siteLink->getPageName() ), $site->getLinkPath() );
if ( !parse_url( $baseUrl, PHP_URL_SCHEME ) ) {
$url = ""
} else {
$url = ""
}or:
global $wgArticlePath; $url = "" '$1', rawurlencode( $title->getPrefixedText() ), $wgArticlePath );Both do not match what Title is doing. This may be annoying, especially as you can't copy URL of pages and use them in SPARQL queries directly.
Should we change behavior to match what Title is doing? Can we do it reliably -
SiteLinksRdfBuilder may not have access to configs needed for actually using Title, but we could simulate it maybe? Should we maybe use wfUrlencode instead of rawurlencode?
TASK DETAIL
EMAIL PREFERENCES
To: Smalyshev
Cc: Smalyshev, JanZerebecki, aude, daniel, Christopher, Aklapper, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331
Cc: Smalyshev, JanZerebecki, aude, daniel, Christopher, Aklapper, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Gehel, Jonas, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
