Re: [PHP] strip tags but preserve title attributes

2009-12-15 Thread Brady Mitchell
On Tue, Dec 15, 2009 at 6:44 AM, Wouter van Vliet / Interpotential
 wrote:
> And if that doesn't suit your needs - you might want to take a look at this:
>
>    http://sourceforge.net/projects/simplehtmldom/
+1

I've never used the html2text library, but simplehtmldom is very easy
to use and has worked very well for me.

Brady

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] strip tags but preserve title attributes

2009-12-15 Thread Wouter van Vliet / Interpotential
I've had quite some luck using the html2text class by Jon Abernathy

   http://www.chuggnutt.com/html2text.php

It's targetted to php 4, and rather old code - but it does the job for me.
Where the 'job for me' is converting html to text for when I'm sending out
emails in HTML format and want to offer the proper plain text alternative.

To be honest, I haven't checked how it handles title/alt attributes on
images - but I'm confident that it does it nicely, and if it doesn't that
you can add it yourself.

And if that doesn't suit your needs - you might want to take a look at this:

http://sourceforge.net/projects/simplehtmldom/

Regards,
Wouter

2009/12/15 Andrew Ballard 

> On Mon, Dec 14, 2009 at 6:43 PM, Ashley Sheridan
>  wrote:
> > I'm looking for a way to strip HTML tags out of some text content
> > (sourced from a web page) to leave just the text which I'll be running
> > some basic analysis on. The thing is, I want to preserve text that is in
> > alt and title attributes. I can't use any DOM functions, as I can't
> > guarantee that the content will be valid XHTML, although it should be
> > valid HTML.
> >
> > I'm happy doing this with string functions and regular expressions, but
> > I was wondering if something for this already existed? The server I plan
> > on putting this on does not have access to the shell (although it is a
> > Linux server) so I won't be able to have Lynx or Elinks parse the
> > content for me either :(
> >
> > Thanks,
> > Ash
> > http://www.ashleysheridan.co.uk
> >
>
> Are you sure you can't use DOM? It has a function specifically for
> parsing HTML that "does not have to be well-formed to load."
>
> http://www.php.net/manual/en/domdocument.loadhtml.php
>
>
> If that doesn't work, you might look at Zend_Filter_StripTags in ZF. I
> don't know if it will do exactly what you're after, but it seems to be
> more flexible than the strip_tags function built into PHP.
>
> Andrew
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
http://www.interpotential.com
http://www.ilikealot.com

Phone: +4520371433


Re: [PHP] strip tags but preserve title attributes

2009-12-15 Thread Andrew Ballard
On Mon, Dec 14, 2009 at 6:43 PM, Ashley Sheridan
 wrote:
> I'm looking for a way to strip HTML tags out of some text content
> (sourced from a web page) to leave just the text which I'll be running
> some basic analysis on. The thing is, I want to preserve text that is in
> alt and title attributes. I can't use any DOM functions, as I can't
> guarantee that the content will be valid XHTML, although it should be
> valid HTML.
>
> I'm happy doing this with string functions and regular expressions, but
> I was wondering if something for this already existed? The server I plan
> on putting this on does not have access to the shell (although it is a
> Linux server) so I won't be able to have Lynx or Elinks parse the
> content for me either :(
>
> Thanks,
> Ash
> http://www.ashleysheridan.co.uk
>

Are you sure you can't use DOM? It has a function specifically for
parsing HTML that "does not have to be well-formed to load."

http://www.php.net/manual/en/domdocument.loadhtml.php


If that doesn't work, you might look at Zend_Filter_StripTags in ZF. I
don't know if it will do exactly what you're after, but it seems to be
more flexible than the strip_tags function built into PHP.

Andrew

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] strip tags but preserve title attributes

2009-12-14 Thread Ashley Sheridan
I'm looking for a way to strip HTML tags out of some text content
(sourced from a web page) to leave just the text which I'll be running
some basic analysis on. The thing is, I want to preserve text that is in
alt and title attributes. I can't use any DOM functions, as I can't
guarantee that the content will be valid XHTML, although it should be
valid HTML.

I'm happy doing this with string functions and regular expressions, but
I was wondering if something for this already existed? The server I plan
on putting this on does not have access to the shell (although it is a
Linux server) so I won't be able to have Lynx or Elinks parse the
content for me either :(

Thanks,
Ash
http://www.ashleysheridan.co.uk