On Sep 10, 12:20 pm, jakecjacobson <jakecjacob...@gmail.com> wrote: > I am trying to build a Python script that reads a Sitemap file and > push the URLs to a Google Search Appliance. I am able to fetch the > XML document and parse it with regular expressions but I want to move > to using native XML tools to do this. The problem I am getting is if > I use urllib.urlopen(url) I can convert the IO Stream to a XML > document but if I use urllib2.urlopen and then read the response, I > get the content but when I use minidom.parse() I get a "IOError: > [Errno 2] No such file or directory:" error
Hello, This may not be helpful, but I note that you are doing two different things with your requests, and judging from the documentation, the objects returned by urllib and urllib2 openers do not appear to be the same. I don't know why you are calling urllib.urlopen(url) and urllib2.urlopen(request), but I can tell you that I have used urllib2 opener to retrieve a web services document in XML and then parse it with minidom.parse(). > > THIS WORKS but will have issues if the IO Stream is a compressed file > def GetPageGuts(net, url): > pageguts = urllib.urlopen(url) > xmldoc = minidom.parse(pageguts) > return xmldoc > > # THIS DOESN'T WORK, but I don't understand why > def GetPageGuts(net, url): > request=getRequest_obj(net, url) > response = urllib2.urlopen(request) > response.headers.items() > pageguts = response.read() Did you note the documentation says: "One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case." No EOF marker might be the cause of the parsing problem. Thanks. mp -- http://mail.python.org/mailman/listinfo/python-list