On Oct 28, 6:18 pm, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Felipe De Bene wrote: > > I'm having problems parsing anHTMLfile with the following syntax : > > > <TABLE cellspacing=0 cellpadding=0 ALIGN=CENTER BORDER=1 width='100%'> > > <TH BGCOLOR='#c0c0c0' Width='3%'>User ID</TH> > > <TH Width='10%' BGCOLOR='#c0c0c0'>Name</TH><TH width='7%' > > BGCOLOR='#c0c0c0'>Date</TH> > > and so on.... > > > whenever I feed the parser with such file I get the error : > > > HTMLParser.HTMLParseError: bad end tag: "</TH BGCOLOR='#c0c0c0'>", at > > line 515, column 45 > > YourHTMLpage is notHTML, i.e. it is broken. Python's HTMLParser is not made > for parsing brokenHTML. However, you can use the parse of lxml.htmlto fix up > yourHTMLfor you. > > http://codespeak.net/lxml/ > > Stefan
Actually i fetch from an application that i thought it should act like this and as I told you, the program is ready to be shipped so rewriting an entire class that has public methods would be a real pain. I really had to find a way to work this out by using the python's parser instead of external libraries. But thanks anyway for the clue, I might start working on a similar project next and this library may be a good and a less painful path. Thanks :D Felipe. -- http://mail.python.org/mailman/listinfo/python-list