Hi Everyone, Thanks for all your suggestions. I read up on gzip and urllib and also learned in the process that I could use urllib2 as its the latest form of that library.
Herewith my solution: I don't know how elegant it is, but it works just fine. def get_contests(): url = ' http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po ' req = urllib2.Request(url) req.add_header('accept-encoding','gzip/deflate') opener = urllib2.build_opener() response = opener.open(req) compressed_data = response.read() compressed_stream = StringIO.StringIO(compressed_data) gzipper = gzip.GzipFile(fileobj=compressed_stream) data = gzipper.read() current_path = os.path.realpath(MEDIA_ROOT + '/xml-files/d.xml') data_file = open(current_path, 'w') data_file.write(data) data_file.close() xml_data = ET.parse(open(current_path, 'r')) contest_list = [] for contest_parent_node in xml_data.getiterator('contest'): contest = Contest() for contest_child_node in contest_parent_node: if (contest_child_node.tag == "name" and contest_child_node.text is not None and contest_child_node.text != ""): contest.name = contest_child_node.text if (contest_child_node.tag == "league" and contest_child_node.text is not None and contest_child_node.text != ""): contest.league = contest_child_node.text if (contest_child_node.tag == "acro" and contest_child_node.text is not None and contest_child_node.text != ""): contest.acro = contest_child_node.text if (contest_child_node.tag == "time" and contest_child_node.text is not None and contest_child_node.text != ""): contest.time = contest_child_node.text if (contest_child_node.tag == "home" and contest_child_node.text is not None and contest_child_node.text != ""): contest.home = contest_child_node.text if (contest_child_node.tag == "away" and contest_child_node.text is not None and contest_child_node.text != ""): contest.away = contest_child_node.text contest_list.append(contest) try: os.remove(current_path) except: pass return contest_list Many thanks! On Tue, May 24, 2011 at 12:35 PM, Stefan Behnel <stefan...@behnel.de> wrote: > Sithembewena Lloyd Dube, 24.05.2011 11:59: > > I am trying to parse an XML feed and display the text of each child node >> without any success. My code in the python shell is as follows: >> >> >>> import urllib >> >>> from xml.etree import ElementTree as ET >> >> >>> content = urllib.urlopen(' >> >> http://xml.matchbook.com/xmlfeed/feed?sport-id=&vendor=TEST&sport-name=&short-name=Po >> ') >> >>> xml_content = ET.parse(content) >> >> I then check the xml_content object as follows: >> >> >>> xml_content >> <xml.etree.ElementTree.ElementTree instance at 0x01DC14B8> >> > > Well, yes, it does return an XML document, but not what you expect: > > >>> urllib.urlopen('URL see above').read() > "<response>\r\n <error-message>you must add 'accept-encoding' as > 'gzip,deflate' to the header of your request</error-message>\r > \n</response>" > > Meaning, the server forces you to pass an HTTP header to the request in > order to receive gzip compressed data. Once you have that, you must > decompress it before passing it into ElementTree's parser. See the > documentation on the gzip and urllib modules in the standard library. > > Stefan > > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- Regards, Sithembewena Lloyd Dube
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor