Matt Feifarek wrote:
> I'd like to use something like the "truncate" feature of webhelpers on 
> html data that's being pulled in from an ATOM feed.
> 
> If I just use a simple truncate, it might leave some html tags opened 
> (like a <div> without a </div>) which is Bad.
> 
> I figured that this was a common-enough task that I'd ask some experts 
> before trying to roll my own solution. It seems like the kind of thing 
> that might be hidden within the standard library somewhere, below my 
> nose, but outside of my ability to discover.
> 
> I've found this:
> http://code.djangoproject.com/browser/django/trunk/django/utils/text.py
> 
> Looks to be about the right thing, but I'd rather not be dependent on 
> all of Django to do this.
> 
> Perhaps some ElementTree or LXML wizard knows a quick hack?

Well... it's hard to truncate exactly, as there's all that annoying 
nesting stuff.  An untested attempt with lxml:

def truncate(doc, chars):
     """Truncate the document in-place to the given number of
     visible characters"""
     length = len(doc.text_content())
     if length > chars:
         _truncate_tail(doc, length-chars)

def _truncate_tail(doc, strip):
     doc.tail, strip = strip_chars(doc.tail, strip)
     while strip:
         if not len(doc):
             break
         strip = _truncate_tail(doc[-1], strip)
         if strip:
             doc.pop()
     if strip:
         doc.text, strip = strip_chars(doc.text, strip)
     return strip

def strip_chars(string, strip):
     if string is None:
         return None, strip
     if len(string) > strip:
         return string[:len(string)-strip], 0
     else:
         return '', strip-len(string)


If you are inclined to finish this and make some tests (doctest-style) I 
could add it to lxml.html, I guess to lxml.html.clean (which also has 
functions for wordwrapping and linking, which seem related).

-- 
Ian Bicking : [EMAIL PROTECTED] : http://blog.ianbicking.org

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to