On Mon, Mar 7, 2011 at 1:22 PM, Bináris <[email protected]> wrote: > Hi, > > when I download a page in HTML, which contains titles of articles, these > titles are something like urlencode()-ed, but not quite; characters like > "(", ")", "!", ",", ":" appear without encoding. > > For example: > <li><a > href="/w/index.php?title=Avant_l%27aurore_(court-m%C3%A9trage)&action=edit&redlink=1" > class="new" title="Avant l'aurore (court-métrage) (page does not > exist)">Avant l'aurore (court-métrage)</a></li> > > Is there a function in pywiki to handle this, or is there available a full > list of non-encoded characters? I used urlencode() + a dict of known > exceptions, but this is not the best solution.
>>> page = wikipedia.Page(wikipedia.getSite(), >>> "Avant_l%27aurore_(court-m%C3%A9trage)") >>> page.urlname() 'Avant_l%27aurore_%28court-m%C3%A9trage%29' -- André Engels, [email protected] _______________________________________________ Pywikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
