On Jun 5, 2008, at 12:59 PM, Matt Feifarek wrote: > I'd like to use something like the "truncate" feature of webhelpers > on html data that's being pulled in from an ATOM feed. > > If I just use a simple truncate, it might leave some html tags > opened (like a <div> without a </div>) which is Bad. > > I figured that this was a common-enough task that I'd ask some > experts before trying to roll my own solution. It seems like the > kind of thing that might be hidden within the standard library > somewhere, below my nose, but outside of my ability to discover. > > I've found this: > http://code.djangoproject.com/browser/django/trunk/django/utils/ > text.py > > Looks to be about the right thing, but I'd rather not be dependent > on all of Django to do this. > > Perhaps some ElementTree or LXML wizard knows a quick hack? > > Thanks! > > >
I've had excellent luck stripping HTML with the following: http://www.aminus.net/browser/cleanhtml.py I use it to strip out all the html leaving a nice plain string. It does the best job of any solutions I've seen. TJ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
