Anne van Kesteren wrote:
On Wed, 06 Dec 2006 15:13:26 +0100, Sam Ruby <[EMAIL PROTECTED]> wrote:
Count me in. This is actually closer to the original reason why I originally subscribed to this list. If given a few tests, I could convert them into a useful form,and this form could serve as a model for future tests.

My original interest was to write a replacement for Python's SGMLLIB, i.e., one that was not based on the theoretical ideal of how SGML vocabularies work, but one based on the practical notion of how HTML actually is parsed.

The HTMLTokenizer for such a project is mostly finished already:

  http://code.google.com/p/html5lib/

(As in, it actually emits the tokens it has to. I'm quite happy about it!)

James Graham has been working on the Tree Construction part of the process (called HTMLParser in parser.py) and Lachlan Hunt is working on an HTMLInputStream class which handles some of the specifics needed for the input stream.

I have no interest in participating in a project without test cases.

On the bright side, the license chosen for that work is fine, and -- if there are test cases -- I have no interest in duplicating others work.

- Sam Ruby

Reply via email to