[Pywikipedia-l] Recursive HTML entities

Fabian Neundorf Thu, 12 Mar 2015 13:44:16 -0700

I've been working on html2unicode in the last days and I stumbled upon
the fact that a &amp; also works as a normal ampersand, so that
&amp;amp; for example gets converted into &. Now the commit which
introduced it into core (fc61025 [1]) is not really descriptive so I
searched in compat's code and found the corresponding commit f97dfb0
[2].


There it links to the discussion on @xqt's talk page [3] which doesn't
really explain what is happening there. The API never returns HTML
entities unless it's the content of a page. I've been testing [4] such
a link and [[&amp;]] does work but not [[&amp;amp;]]. Also the entitey
&nbsp; gets properly encoded, but [[&amp;nbsp;]] also only once.

My question here is why is it necessary and especially in core which
only does API requests which shouldn't suffer from such a problem it
could be changed probably. The only reason I see if something is
decoding text improperly and converts &nbsp; into &amp;nbsp; which
shouldn't be our concern.

Fabian

[1]: 
https://github.com/wikimedia/pywikibot-core/commit/fc6102527e4c556cd77aa87736869a3510b0b7d5
[2]: 
https://git.wikimedia.org/blobdiff/pywikibot%2Fcompat.git/f97dfb0d1ca49751ccf615e5082c1c4df5655d40/wikipedia.py
[3]: 
https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3AXqt&action=historysubmit&diff=96907484&oldid=96904091
[4]: https://en.wikipedia.org/wiki/User:XZise/linktest

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

[Pywikipedia-l] Recursive HTML entities

Reply via email to