on 05/18/2010 07:30 PM Rene Veerman said the following:
> Hi.
> I'm trying to build a html analyzer that looks at natural words in html text.
> I'd like to build a routine that walks through the HTML character by
> character, but i'm not sure on how to properly walk through escaped "
> and ' characters in javascript or other embedded languages. Skipping
> the first " and ' is no problem, but after that, the escaped " and ',
> they can get difficult imo.
> If you have any ideas on this i'd like to hear 'm..

Better try something that is already done. HTML parsing is not that
trivial. If the HTML you are parsing is malformed, things get worse.

You may want to try this HTML parser package. It can parse HTML, CSS,
DTD, etc.. in pure PHP. No special extensions required. It can tolerate
malformed HTML and even filter insecure HTML and CSS that may contain
dangerous Javascript. Actually it was done mainly for that purpose.



Manuel Lemos

Find and post PHP jobs

PHP Classes - Free ready to use OOP components written in PHP

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to