Re: [Pywikipedia-l] Encoding in HTML source

Andre Engels Mon, 07 Mar 2011 04:49:10 -0800

On Mon, Mar 7, 2011 at 1:22 PM, Bináris <[email protected]> wrote:
> Hi,
>
> when I download a page in HTML, which contains titles of articles, these
> titles are something like urlencode()-ed, but not quite; characters like
> "(", ")", "!", ",", ":" appear without encoding.
>
> For example:
> <li><a
> href="/w/index.php?title=Avant_l%27aurore_(court-m%C3%A9trage)&amp;action=edit&amp;redlink=1"
> class="new" title="Avant l'aurore (court-métrage) (page does not
> exist)">Avant l'aurore (court-métrage)</a></li>
>
> Is there a function in pywiki to handle this, or is there available a full
> list of non-encoded characters? I used urlencode() + a dict of known
> exceptions, but this is not the best solution.


>>> page = wikipedia.Page(wikipedia.getSite(), 
>>> "Avant_l%27aurore_(court-m%C3%A9trage)")
>>> page.urlname()
'Avant_l%27aurore_%28court-m%C3%A9trage%29'



-- 
André Engels, [email protected]

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Re: [Pywikipedia-l] Encoding in HTML source

Reply via email to