Re: HTMLParser chokes on bad end tag in comment
Edward Elliott wrote: > Guess you learn something new every day. Too bad there's so much illegal > code in the wild. :( if more people learned something new every day, the wild would look a lot different. -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser chokes on bad end tag in comment
Fredrik Lundh wrote: >> Should it? The end tag it chokes on is in comment, isn't it? > > no. STYLE and SCRIPT elements contain character data, not parsed > character data, so comments are treated as characters, and the first > "http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data Element content When script or style data is the content of an element (SCRIPT and STYLE), the data begins immediately after the element start tag and ends at the first ETAGO ("") before the SCRIPT end tag: document.write ("This won't work") In JavaScript, this code can be expressed legally by hiding the ETAGO delimiter before an SGML name start character: document.write ("This will work<\/EM>") Guess you learn something new every day. Too bad there's so much illegal code in the wild. :( -- Edward Elliott UC Berkeley School of Law (Boalt Hall) complangpython at eddeye dot net -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser chokes on bad end tag in comment
Miki: >You can also check out BeautifulSoup >(http://www.crummy.com/software/BeautifulSoup/) which is less strict >than the regular HTML parser. Yes, thanks. Ik this case it was my sitechecker which checks for syntax and broken links, so it was supposed to find the syntax error. BeautifulSoup is not very well suited for validators :-) -- René Pijlman -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser chokes on bad end tag in comment
Hello Rene, You can also check out BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) which is less strict than the regular HTML parser. HTH, Miki http://pythonwise.blogspot.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser chokes on bad end tag in comment
Fredrik Lundh: >Rene Pijlman: >[end tag in html comment in script element] >The end tag it chokes on is in comment, isn't it? > >no. STYLE and SCRIPT elements contain character data, not parsed >character data, so comments are treated as characters, and the first >"if you have broken documents, you can tweak this by setting the >CDATA_CONTENT_ELEMENTS parser attribute before you start parsing. ... and in the mean time that's a good workaround. Thank you very much Fredrik. -- René Pijlman -- http://mail.python.org/mailman/listinfo/python-list
Re: HTMLParser chokes on bad end tag in comment
Rene Pijlman wrote: > The code below results in an exception (Python 2.4.2): > > HTMLParser.HTMLParseError: bad end tag: "", at line 4, > column 6 > > Should it? The end tag it chokes on is in comment, isn't it? no. STYLE and SCRIPT elements contain character data, not parsed character data, so comments are treated as characters, and the first " -- http://mail.python.org/mailman/listinfo/python-list