Le 3 déc. 2006 à 17:04, J. King a écrit :
I am. It's not anywhere near finished yet, but the parser so far
goes through the whole document and spits out the appropriate
tokens; I just haven't done anything with said tokens yet, mainly
because I was discouraged by PHP's DOM implementation.
My parser is also slow as molasses, unfortunately.
My experience optimizing PHP Markdown, and building the custom mixed
Markdown/HTML-block pesudo-tokenizer of PHP Markdown Extra, tells me
that it'll probably stay very slow as long as the implementation is
made of PHP code.
Assuming you've implemented the algorithm in the spec as PHP code,
you could probably make it faster by using regular expressions in the
tokenization steps instead of iterating character by character. For
instance, you could implement many of the tokenizer states by
matching from the start of a string with a regex. And maybe then
it'll also be possible to combine a couple of states within the same
regex too.
The more we replace PHP code by regular expressions, the faster it'll
go, but further we deviate from the processing algorithm described in
the spec. I wonder how far we could go while keeping the exact same
behaviour.
The true good solution would be to have a parser implemented in C and
available through every standard installation of PHP. It could be
used by other languages too.
Michel Fortin
[EMAIL PROTECTED]
http://www.michelf.com/