Re: Truncating an html string safely

TJ Ninneman Thu, 05 Jun 2008 11:59:22 -0700

On Jun 5, 2008, at 12:59 PM, Matt Feifarek wrote:

> I'd like to use something like the "truncate" feature of webhelpers  
> on html data that's being pulled in from an ATOM feed.
>
> If I just use a simple truncate, it might leave some html tags  
> opened (like a <div> without a </div>) which is Bad.
>
> I figured that this was a common-enough task that I'd ask some  
> experts before trying to roll my own solution. It seems like the  
> kind of thing that might be hidden within the standard library  
> somewhere, below my nose, but outside of my ability to discover.
>
> I've found this:
> http://code.djangoproject.com/browser/django/trunk/django/utils/ 
> text.py
>
> Looks to be about the right thing, but I'd rather not be dependent  
> on all of Django to do this.
>
> Perhaps some ElementTree or LXML wizard knows a quick hack?
>
> Thanks!
>
> >


I've had excellent luck stripping HTML with the following:

http://www.aminus.net/browser/cleanhtml.py

I use it to strip out all the html leaving a nice plain string.  It  
does the best job of any solutions I've seen.

TJ


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Truncating an html string safely

Reply via email to