On Thu, Jun 5, 2008 at 11:56 AM, TJ Ninneman <[EMAIL PROTECTED]> wrote:
> On Jun 5, 2008, at 12:59 PM, Matt Feifarek wrote:
>
> I'd like to use something like the "truncate" feature of webhelpers on html
> data that's being pulled in from an ATOM feed.
>
> If I just use a simple truncate, it might leave some html tags opened (like
> a <div> without a </div>) which is Bad.
>
> I figured that this was a common-enough task that I'd ask some experts
> before trying to roll my own solution. It seems like the kind of thing that
> might be hidden within the standard library somewhere, below my nose, but
> outside of my ability to discover.
>
> I've found this:
> http://code.djangoproject.com/browser/django/trunk/django/utils/text.py
>
> Looks to be about the right thing, but I'd rather not be dependent on all of
> Django to do this.
>
> Perhaps some ElementTree or LXML wizard knows a quick hack?
>
> Thanks!
>
>
>
>
> I've had excellent luck stripping HTML with the following:
> http://www.aminus.net/browser/cleanhtml.py
> I use it to strip out all the html leaving a nice plain string.  It does the
> best job of any solutions I've seen.
>
> TJ

I think he just wants to make sure the HTML is well-formed, not strip
the tags completely.  However, strip_tags() is something WebHelpers
should provide.  I've noticed the lack a couple times.  However, I'm
not sure of the best implementation.

    - sgmllib: (used in cleanhtml.py): not in Python 3.  Can
cleanhtml.py be ported to HTMLParser?

    - lxml: hard to install on Mac and Windows due to C dependencies.

    - BeautifulSoup: has the best ability to parse real-world (i.e.,
misformed) HTML.  However, it's a largish library so I'm not sure any
helper should depend on it.

    - Simplicity vs speed.  Would routines that depend only on the
Python standard library be fast enough?

As for Matt's case of truncating HTML without making it misformed,
would this be widely enough used to justify making a webhelper for it?

-- 
Mike Orr <[EMAIL PROTECTED]>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to