On Thu, Jun 5, 2008 at 5:03 PM, Mike Orr <[EMAIL PROTECTED]> wrote: > > On Thu, Jun 5, 2008 at 1:01 PM, Ian Bicking <[EMAIL PROTECTED]> wrote: > > > > Mike Orr wrote: > >> On Thu, Jun 5, 2008 at 11:56 AM, TJ Ninneman <[EMAIL PROTECTED]> > wrote: > >>> On Jun 5, 2008, at 12:59 PM, Matt Feifarek wrote: > >>> > >>> I'd like to use something like the "truncate" feature of webhelpers on > html > >>> data that's being pulled in from an ATOM feed. > >>> > >>> If I just use a simple truncate, it might leave some html tags opened > (like > >>> a <div> without a </div>) which is Bad. > >>> > >>> I figured that this was a common-enough task that I'd ask some experts > >>> before trying to roll my own solution. It seems like the kind of thing > that > >>> might be hidden within the standard library somewhere, below my nose, > but > >>> outside of my ability to discover. > >>> > >>> I've found this: > >>> > http://code.djangoproject.com/browser/django/trunk/django/utils/text.py > >>> > >>> Looks to be about the right thing, but I'd rather not be dependent on > all of > >>> Django to do this. > >>> > >>> Perhaps some ElementTree or LXML wizard knows a quick hack? > >>> > >>> Thanks! > >>> > >>> > >>> > >>> > >>> I've had excellent luck stripping HTML with the following: > >>> http://www.aminus.net/browser/cleanhtml.py > >>> I use it to strip out all the html leaving a nice plain string. It > does the > >>> best job of any solutions I've seen. > >>> > >>> TJ > >> > >> I think he just wants to make sure the HTML is well-formed, not strip > >> the tags completely. However, strip_tags() is something WebHelpers > >> should provide. I've noticed the lack a couple times. However, I'm > >> not sure of the best implementation. > > > > strip_tags should be easy enough to implement with some regexes -- you > > just have to remove <.*?>, then resolve any entities. > > > > This code does some fairly simplistic rendering of HTML (but better than > > what strip_tags would likely do), and might have a better home in > > WebHelpers: > > http://svn.w4py.org/ZPTKit/trunk/ZPTKit/htmlrender.py > > Put in the WebHelpers "unfinished" directory and opened ticket #458 to > integrate it. >
I have some boiler plate multi-threaded examples of using beautiful soup here: http://www-128.ibm.com/developerworks/aix/library/au-threadingpython/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---
