Re: Processing XML that's embedded in HTML

2008-01-23 Thread Stefan Behnel
Mike Driscoll wrote: > Both the normal parser example and the objectify example you gave me > give a traceback as follows: > > Traceback (most recent call last): > File "\\clippy\xml_parser2.py", line 70, in -toplevel- > for row in tree.iterfind("//Row"): > AttributeError: 'etree._ElementTre

Re: Processing XML that's embedded in HTML

2008-01-23 Thread Mike Driscoll
On Jan 22, 5:31 pm, Paul McGuire <[EMAIL PROTECTED]> wrote: > On Jan 22, 10:57 am, Mike Driscoll <[EMAIL PROTECTED]> wrote:> Hi, > > > I need to parse a fairly complex HTML page that has XML embedded in > > it. I've done parsing before with the xml.dom.minidom module on just > > plain XML, but I ca

Re: Processing XML that's embedded in HTML

2008-01-23 Thread Mike Driscoll
Stefan, > I would really encourage you to use the normal parser here instead of > iterparse(). > > from lxml import etree > parser = etree.HTMLParser() > > # parse the HTML/XML melange > tree = etree.parse(filename, parser) > > # if you want, you can construct a pure XML document > ro

Re: Processing XML that's embedded in HTML

2008-01-23 Thread Mike Driscoll
John and Stefan, On Jan 23, 5:33 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Hi, > > Mike Driscoll wrote: > > I got lxml to create a tree by doing the following: > > > from lxml import etree > > from StringIO import StringIO > > > parser = etree.HTMLParser() > > tree = etree.parse(filename, par

Re: Processing XML that's embedded in HTML

2008-01-23 Thread Stefan Behnel
Hi, Mike Driscoll wrote: > I got lxml to create a tree by doing the following: > > from lxml import etree > from StringIO import StringIO > > parser = etree.HTMLParser() > tree = etree.parse(filename, parser) > xml_string = etree.tostring(tree) > context = etree.iterparse(StringIO(xml_string))

Re: Processing XML that's embedded in HTML

2008-01-22 Thread Paul McGuire
On Jan 22, 10:57 am, Mike Driscoll <[EMAIL PROTECTED]> wrote: > Hi, > > I need to parse a fairly complex HTML page that has XML embedded in > it. I've done parsing before with the xml.dom.minidom module on just > plain XML, but I cannot get it to work with this HTML page. > > The XML looks like thi

Re: Processing XML that's embedded in HTML

2008-01-22 Thread Paul Boddie
On 22 Jan, 21:48, Mike Driscoll <[EMAIL PROTECTED]> wrote: > On Jan 22, 11:32 am, Paul Boddie <[EMAIL PROTECTED]> wrote: > > > [1]http://www.python.org/pypi/libxml2dom > > I must have tried this module quite a while ago since I already have > it installed. I see you're the author of the module, so

Re: Processing XML that's embedded in HTML

2008-01-22 Thread John Machin
On Jan 23, 7:48 am, Mike Driscoll <[EMAIL PROTECTED]> wrote: [snip] > I'm not sure what is wrong here...but I got lxml to create a tree from > by doing the following: > > > from lxml import etree > from StringIO import StringIO > > parser = etree.HTMLParser() > tree = etree.parse(filename, parser

Re: Processing XML that's embedded in HTML

2008-01-22 Thread Mike Driscoll
On Jan 22, 11:32 am, Paul Boddie <[EMAIL PROTECTED]> wrote: > > The rest of the document is html, javascript div tags, etc. I need the > > information only from the row where the Relationship tag = Owner and > > the Priority tag = 1. The rest I can ignore. When I tried parsing it > > with minidom,

Re: Processing XML that's embedded in HTML

2008-01-22 Thread Paul Boddie
On 22 Jan, 17:57, Mike Driscoll <[EMAIL PROTECTED]> wrote: > > I need to parse a fairly complex HTML page that has XML embedded in > it. I've done parsing before with the xml.dom.minidom module on just > plain XML, but I cannot get it to work with this HTML page. It's HTML day on comp.lang.python