On Wed, Jun 11, 2008 at 11:21 PM, Shannon -jj Behrens <[EMAIL PROTECTED]> wrote:
>
> On Sat, Jun 7, 2008 at 7:24 AM, Matt Feifarek <[EMAIL PROTECTED]> wrote:
>> Oops; replied from the wrong address.
>>
>> ---------- Forwarded message ----------
>>
>> On Thu, Jun 5, 2008 at 2:36 PM, Ian Bicking <[EMAIL PROTECTED]> wrote:
>>>
>>> Well... it's hard to truncate exactly, as there's all that annoying
>>> nesting stuff.  An untested attempt with lxml:
>>
>> Exactly. Thanks for the lead.
>>
>> I'm not sure I'm up to the challenge, but if I do get it working, I'll get
>> it back to you, in case it's good enough to be added to lxml (or whatever).
>>
>> Mike:
>> Seems like if we have the truncate function in webhelpers, a truncate that
>> handles html would be wise... since we're, err, making html, usually, with
>> Pylons.
>>
>> Since the Django code doesn't seem to depend on anything (but some Django
>> cruft, which seems to be frosting really) MAYBE it would be better to start
>> with.
>>
>> But I'll poke around a bit today.
>
> It would be fun to write a SAX handler that permits all tags, and
> counts all characters.  It would stop permitting additional characters
> once it reached a certain limit.

Just to confirm, I'm planning to use Ian's code for WebHelpers
HTML-to-text renderer because it uses HTMLParser and has no external
dependencies.  It's currently in WebHelpers/unfinished/htmlrender.py
in the 0.6 source and at
http://svn.w4py.org/ZPTKit/trunk/ZPTKit/htmlrender.py.

Noah offered an alternative using BeautifulSoup, and Matt recommended
something from Django (which would mean deleting unnecessary Django
dependencies).  If somebody can tell me what these can do that Ian's
code can't, I might reconsider.

Although again, we have two issues.  One is HTML-to-text (essentially
lynx-as-a-function).  The other is truncating an HTML string while
keeping it well-formed (which means not stopping in the middle of a
tag and closing any open tags).

-- 
Mike Orr <[EMAIL PROTECTED]>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to