subject:"HTMLParser chokes on bad end tag in comment"

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Fredrik Lundh

Rene Pijlman wrote:

 The code below results in an exception (Python 2.4.2):
 
 HTMLParser.HTMLParseError: bad end tag: /foo' + 'bar, at line 4,
 column 6
 
 Should it? The end tag it chokes on is in comment, isn't it?

no.  STYLE and SCRIPT elements contain character data, not parsed 
character data, so comments are treated as characters, and the first 
/ ends the element.

if you have broken documents, you can tweak this by setting the 
CDATA_CONTENT_ELEMENTS parser attribute before you start parsing.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Rene Pijlman

Fredrik Lundh:
Rene Pijlman:
[end tag in html comment in script element]
The end tag it chokes on is in comment, isn't it?

no.  STYLE and SCRIPT elements contain character data, not parsed 
character data, so comments are treated as characters, and the first 
/ ends the element.

Ah, I see. I'll report the problem to the application that's generating
this broken code (vBulletin forum)...

if you have broken documents, you can tweak this by setting the 
CDATA_CONTENT_ELEMENTS parser attribute before you start parsing.

... and in the mean time that's a good workaround.

Thank you very much Fredrik.

-- 
René Pijlman
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Miki

Hello Rene,

You can also check out BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/) which is less strict
than the regular HTML parser.

HTH,
Miki
http://pythonwise.blogspot.com/

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Rene Pijlman

Miki:
You can also check out BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/) which is less strict
than the regular HTML parser.

Yes, thanks. Ik this case it was my sitechecker which checks for syntax
and broken links, so it was supposed to find the syntax error.
BeautifulSoup is not very well suited for validators :-)

-- 
René Pijlman
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Edward Elliott

Fredrik Lundh wrote:

 Should it? The end tag it chokes on is in comment, isn't it?
 
 no.  STYLE and SCRIPT elements contain character data, not parsed
 character data, so comments are treated as characters, and the first
 / ends the element.

Rather than take your word for it, I checked the W3C HTML4 DTD and found
this:

http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data

Element content 

When script or style data is the content of an element (SCRIPT and STYLE),
the data begins immediately after the element start tag and ends at the
first ETAGO (/) delimiter followed by a name start character ([a-zA-Z]);
note that this may not be the element's end tag. Authors should therefore
escape / within the content. Escape mechanisms are specific to each
scripting or style sheet language.

ILLEGAL EXAMPLE:
The following script data incorrectly contains a / sequence (as part of
/EM) before the SCRIPT end tag:

SCRIPT type=text/javascript
  document.write (EMThis won't work/EM)
/SCRIPT

In JavaScript, this code can be expressed legally by hiding the ETAGO
delimiter before an SGML name start character:

SCRIPT type=text/javascript
  document.write (EMThis will work\/EM)
/SCRIPT


Guess you learn something new every day.  Too bad there's so much illegal
code in the wild. :(

-- 
Edward Elliott
UC Berkeley School of Law (Boalt Hall)
complangpython at eddeye dot net
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

2006-05-29 Thread Fredrik Lundh

Edward Elliott wrote:

 Guess you learn something new every day.  Too bad there's so much illegal
 code in the wild. :(

if more people learned something new every day, the wild would look a 
lot different.

/F


-- 
http://mail.python.org/mailman/listinfo/python-list

HTMLParser chokes on bad end tag in comment

2006-05-28 Thread Rene Pijlman

The code below results in an exception (Python 2.4.2):

HTMLParser.HTMLParseError: bad end tag: /foo' + 'bar, at line 4,
column 6

Should it? The end tag it chokes on is in comment, isn't it?

import HTMLParser
HTMLParser.HTMLParser().feed(
htmlheadtitle/title/headbodyscript
!--
x = '/foo' + 'bar'
// --
/script/body/html
)

-- 
René Pijlman
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTMLParser chokes on bad end tag in comment

Re: HTMLParser chokes on bad end tag in comment

Re: HTMLParser chokes on bad end tag in comment

Re: HTMLParser chokes on bad end tag in comment

Re: HTMLParser chokes on bad end tag in comment

Re: HTMLParser chokes on bad end tag in comment

HTMLParser chokes on bad end tag in comment

7 matches

Site Navigation

Mail list logo

Footer information