Hi

I do it manually. Start from scratch and if was charging for it (i.e contracting) would put in a tag soup multiplier of between 1.1 and N times the estimate for the work, depending on whether it was touched by the Hand of Frontpage :D

You could also do some regex work to remove all tags, every substring that starts in a <and ends at the next> then you'd get the content.

PHP's strip_tags(string, allowable tags) will also remove tags from a string
http://au3.php.net/strip_tags
and leave the correct allowable tags. Read the manual entry for that one. I use it to provide a junkmail interface with harmful tags such as object, embed, iframe, frame, form, a etc (anything related to javascript, activex, user input etc ) removed
Tidy will cleanup your tags so that a strip_tags may work well enough off the bat.


A Tidy class can be found in the PHP PECL repository at http://pecl.php.net/packages.php?catpid=10&catname=HTML (try the HTML classes at pear.php.net as well).

Cheers
James

Neerav wrote:

I havent had to do it yet (thank goodness!) but if I did need to I would use "Tidy"

http://www.w3.org/People/Raggett/tidy/
http://tidy.sourceforge.net/



*****************************************************
The discussion list for http://webstandardsgroup.org/
See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list & getting help
*****************************************************




Reply via email to