I've been working on html2unicode in the last days and I stumbled upon
the fact that a & also works as a normal ampersand, so that
& for example gets converted into &. Now the commit which
introduced it into core (fc61025 [1]) is not really descriptive so I
searched in compat's code and found the corresponding commit f97dfb0
[2].

There it links to the discussion on @xqt's talk page [3] which doesn't
really explain what is happening there. The API never returns HTML
entities unless it's the content of a page. I've been testing [4] such
a link and [[&]] does work but not [[&]]. Also the entitey
  gets properly encoded, but [[ ]] also only once.

My question here is why is it necessary and especially in core which
only does API requests which shouldn't suffer from such a problem it
could be changed probably. The only reason I see if something is
decoding text improperly and converts   into   which
shouldn't be our concern.

Fabian

[1]: 
https://github.com/wikimedia/pywikibot-core/commit/fc6102527e4c556cd77aa87736869a3510b0b7d5
[2]: 
https://git.wikimedia.org/blobdiff/pywikibot%2Fcompat.git/f97dfb0d1ca49751ccf615e5082c1c4df5655d40/wikipedia.py
[3]: 
https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3AXqt&action=historysubmit&diff=96907484&oldid=96904091
[4]: https://en.wikipedia.org/wiki/User:XZise/linktest

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to