Simon Forman wrote: > I want to take a webpage, find all URLs (links, img src, etc.) and > rewrite them in-place, and I'd like to do it in python (pure python > preferred.)
lxml.html has functions specifically for this problem. http://codespeak.net/lxml/lxmlhtml.html#working-with-links Code would be something like html_doc = lxml.html.parse(b"http://.../xyz.html") html_doc.rewrite_links( ... ) print( lxml.html.tostring(html_doc) ) It also handles links in CSS or JavaScript, as well as broken HTML documents. Stefan -- http://mail.python.org/mailman/listinfo/python-list