Tom Palmer wrote:
> 
> > only one I know of that deals well with bad html) and quite difficult to
> > implement properly (or so it seems).
> >
> There are two parts to the difficulty of HTML.  One is the parsing.
> The other is the display.
> 
> The parsing is easier than the display, but it still has its pitfalls.
> Such as, when a tag isn't closed properly, when do you
> consider it closed.  Individual tags must be taken on a case by
> case basis.  Also, overlapping <b> and <i> elements behave
> differently in different browsers.  Whose model do you choose?
> Many more issues exist.

If a tag is not closed but it's parent is closed, the tag will be
forcefully closed and an error issued (but will not stop the parser). If
the tag is optional closing (like P), no error will be issued. If the
tag is explicitly closed (e.g. LI closes another LI, /UL and /OL close
any open LI) it will be properly dealt with.

HTML and BODY tags are always created whether they exist or not in the
file.

This is all taken care of and is most of what the HTML parser is
supposed to do, as opposed to an XML parser which demands well formed
documents.

As for overlapping <b>, <i> and <form> (tricky), I use the DOM
normalization and not any specific approach taken by any one parser.
It's a bit easier for a parser to work with <b>/<i> since it need not
create a DOM but just fontify text sections.

arkin

> 
> Piece of cake, huh.
> 
> - Tom Palmer

Reply via email to