Re: [whatwg] Provding Better Tools

Michel Fortin Sun, 03 Dec 2006 17:10:54 -0800

Le 3 déc. 2006 à 17:04, J. King a écrit :

I am. It's not anywhere near finished yet, but the parser so fargoes through the whole document and spits out the appropriatetokens; I just haven't done anything with said tokens yet, mainlybecause I was discouraged by PHP's DOM implementation.
My parser is also slow as molasses, unfortunately.

My experience optimizing PHP Markdown, and building the custom mixedMarkdown/HTML-block pesudo-tokenizer of PHP Markdown Extra, tells methat it'll probably stay very slow as long as the implementation ismade of PHP code.

Assuming you've implemented the algorithm in the spec as PHP code,you could probably make it faster by using regular expressions in thetokenization steps instead of iterating character by character. Forinstance, you could implement many of the tokenizer states bymatching from the start of a string with a regex. And maybe thenit'll also be possible to combine a couple of states within the sameregex too.

The more we replace PHP code by regular expressions, the faster it'llgo, but further we deviate from the processing algorithm described inthe spec. I wonder how far we could go while keeping the exact samebehaviour.

The true good solution would be to have a parser implemented in C andavailable through every standard installation of PHP. It could beused by other languages too.



Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/

Re: [whatwg] Provding Better Tools

Reply via email to