Ashley Sheridan wrote:
I'm looking for a way to strip HTML tags out of some text content
(sourced from a web page) to leave just the text which I'll be running
some basic analysis on. The thing is, I want to preserve text that is in
alt and title attributes. I can't use any DOM functions, as I can't
guarantee that the content will be valid XHTML, although it should be
valid HTML.

I'm happy doing this with string functions and regular expressions, but
I was wondering if something for this already existed? The server I plan
on putting this on does not have access to the shell (although it is a
Linux server) so I won't be able to have Lynx or Elinks parse the
content for me either :(

Thanks,
Ash
http://www.ashleysheridan.co.uk




Sounds easy with a simple regex expression, certainly easier than twisting a class or DOM function to do the job.

How do you want to retain the text that is in the alt and title attributes? What form do you want it in? e.g., <img xxxx alt="foo">

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to