[python-north-west] Parsing html using lxml ...

Stuart Grimshaw Sun, 15 Mar 2009 17:35:50 -0700

Can anyone put their finger on what's going on here?

I'm trying to parse a web page for the local council using lxml and
when it gets as far as parsing the html the program just hangs, this
test script is below, and, well I'm stumped. It even freezes if I
stick the BBCs site in.


What am I missing?



#!/usr/bin/env python
# encoding: utf-8

import sys
import os
import httplib
from lxml import html

def main():
        print "Getting ward list"
        host = "www.sheffield.gov.uk"
        councillorURL = "/your-city-council/councillors"

        conn = httplib.HTTPConnection(host)
        conn.request("GET", councillorURL)
        r = conn.getresponse()

        print "Retrieved ward index page"

        self.dom = html.parse(r.read())

if __name__ == '__main__':
        main()



-- 
-S

Follow me on Twitter: http://twitter.com/stubbs
Blog: http://stubblog.wordpress.com
My art: http://stuartgrimshaw.imagekind.com
Stock Images: http://psc.photoshelter.com/user/stuartgrimshaw



-- 
-S

Follow me on Twitter: http://twitter.com/stubbs
Blog: http://stubblog.wordpress.com
My art: http://stuartgrimshaw.imagekind.com
Stock Images: http://psc.photoshelter.com/user/stuartgrimshaw

--~--~---------~--~----~------------~-------~--~----~
To post: [email protected]
To unsubscribe: [email protected]
Feeds available at http://groups.google.com/group/python-north-west/feeds
For more options: http://groups.google.com/group/python-north-west
-~----------~----~----~----~------~----~------~--~---

[python-north-west] Parsing html using lxml ...

Reply via email to