Lucas_Werkmeister_WMDE added a comment.
In T327514#8630437 <https://phabricator.wikimedia.org/T327514#8630437>, @Nikki wrote: > In T327514#8598366 <https://phabricator.wikimedia.org/T327514#8598366>, @ItamarWMDE wrote: > >> - In order to protect against malicious queries (these should never be in real sitelink URLs), don’t decode (or, re-encode after decoding) stuff like >> - whitespace characters >> - control characters > > Some of those characters are required by other scripts and therefore do appear in real sitelink URLs. Zero-width joiner <https://en.wikipedia.org/wiki/Zero-width_joiner> and zero-width non-joiner <https://en.wikipedia.org/wiki/Zero-width_non-joiner> in particular can be relatively common in Arabic and Indic scripts, and `select * { ?sitelink schema:isPartOf <https://fa.wikisource.org/> } limit 1000` includes quite a few zero-width non-joiners. Hm, true, this doesn’t look nice :/ F36863674: image.png <https://phabricator.wikimedia.org/F36863674> Let me see if there’s a more restrictive Unicode category we can use. TASK DETAIL https://phabricator.wikimedia.org/T327514 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: ItamarWMDE, Aklapper, Arian_Bozorg, Nikki, Sarai-WMDE, Astuthiodit_1, AWesterinen, karapayneWMDE, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Mahir256, QZanden, EBjune, merbst, LawExplorer, Salgo60, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
