Matt Feifarek wrote:
> I'd like to use something like the "truncate" feature of webhelpers on
> html data that's being pulled in from an ATOM feed.
>
> If I just use a simple truncate, it might leave some html tags opened
> (like a <div> without a </div>) which is Bad.
>
> I figured that this was a common-enough task that I'd ask some experts
> before trying to roll my own solution. It seems like the kind of thing
> that might be hidden within the standard library somewhere, below my
> nose, but outside of my ability to discover.
>
> I've found this:
> http://code.djangoproject.com/browser/django/trunk/django/utils/text.py
>
> Looks to be about the right thing, but I'd rather not be dependent on
> all of Django to do this.
>
> Perhaps some ElementTree or LXML wizard knows a quick hack?
Well... it's hard to truncate exactly, as there's all that annoying
nesting stuff. An untested attempt with lxml:
def truncate(doc, chars):
"""Truncate the document in-place to the given number of
visible characters"""
length = len(doc.text_content())
if length > chars:
_truncate_tail(doc, length-chars)
def _truncate_tail(doc, strip):
doc.tail, strip = strip_chars(doc.tail, strip)
while strip:
if not len(doc):
break
strip = _truncate_tail(doc[-1], strip)
if strip:
doc.pop()
if strip:
doc.text, strip = strip_chars(doc.text, strip)
return strip
def strip_chars(string, strip):
if string is None:
return None, strip
if len(string) > strip:
return string[:len(string)-strip], 0
else:
return '', strip-len(string)
If you are inclined to finish this and make some tests (doctest-style) I
could add it to lxml.html, I guess to lxml.html.clean (which also has
functions for wordwrapping and linking, which seem related).
--
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---