On Thu, Jun 5, 2008 at 5:03 PM, Mike Orr <[EMAIL PROTECTED]> wrote:

>
> On Thu, Jun 5, 2008 at 1:01 PM, Ian Bicking <[EMAIL PROTECTED]> wrote:
> >
> > Mike Orr wrote:
> >> On Thu, Jun 5, 2008 at 11:56 AM, TJ Ninneman <[EMAIL PROTECTED]>
> wrote:
> >>> On Jun 5, 2008, at 12:59 PM, Matt Feifarek wrote:
> >>>
> >>> I'd like to use something like the "truncate" feature of webhelpers on
> html
> >>> data that's being pulled in from an ATOM feed.
> >>>
> >>> If I just use a simple truncate, it might leave some html tags opened
> (like
> >>> a <div> without a </div>) which is Bad.
> >>>
> >>> I figured that this was a common-enough task that I'd ask some experts
> >>> before trying to roll my own solution. It seems like the kind of thing
> that
> >>> might be hidden within the standard library somewhere, below my nose,
> but
> >>> outside of my ability to discover.
> >>>
> >>> I've found this:
> >>>
> http://code.djangoproject.com/browser/django/trunk/django/utils/text.py
> >>>
> >>> Looks to be about the right thing, but I'd rather not be dependent on
> all of
> >>> Django to do this.
> >>>
> >>> Perhaps some ElementTree or LXML wizard knows a quick hack?
> >>>
> >>> Thanks!
> >>>
> >>>
> >>>
> >>>
> >>> I've had excellent luck stripping HTML with the following:
> >>> http://www.aminus.net/browser/cleanhtml.py
> >>> I use it to strip out all the html leaving a nice plain string.  It
> does the
> >>> best job of any solutions I've seen.
> >>>
> >>> TJ
> >>
> >> I think he just wants to make sure the HTML is well-formed, not strip
> >> the tags completely.  However, strip_tags() is something WebHelpers
> >> should provide.  I've noticed the lack a couple times.  However, I'm
> >> not sure of the best implementation.
> >
> > strip_tags should be easy enough to implement with some regexes -- you
> > just have to remove <.*?>, then resolve any entities.
> >
> > This code does some fairly simplistic rendering of HTML (but better than
> > what strip_tags would likely do), and might have a better home in
> > WebHelpers:
> > http://svn.w4py.org/ZPTKit/trunk/ZPTKit/htmlrender.py
>
> Put in the WebHelpers "unfinished" directory and opened ticket #458 to
> integrate it.
>

I have some boiler plate multi-threaded examples of using beautiful soup
here:

http://www-128.ibm.com/developerworks/aix/library/au-threadingpython/

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to