On Jun 12, 5:13 am, "Mike Orr" <[EMAIL PROTECTED]> wrote:
> Although again, we have two issues.  One is HTML-to-text (essentially
> lynx-as-a-function).  The other is truncating an HTML string while
> keeping it well-formed (which means not stopping in the middle of a
> tag and closing any open tags).

You might also want to look here:

http://www.zope.org/Members/chrisw/StripOGram
http://www.gnome.org/~jdub/bzr/planet/2.0/planet/sanitize.py

My $0.02 is that truncating HTML while ensuring it is well-formed is
not something that should be spent time on implementing in a web
helper.  Take this example for instance:

<h1>My Page Subject</h1>
<div>
    <p>Lorem Ipsum...[another 200 characters]</p>
    <p>Lorem Ipsum...[another 200 characters]</p>
    <p>Lorem Ipsum...[another 200 characters]</p>
    <p>Lorem Ipsum...[another 200 characters]</p>
    <p>Lorem Ipsum...[another 200 characters]</p>
</div>

Lets say that I want the first 150 characters, what is going to
happen?  I am going to get 1000+ characters b/c of the <div> that is
wrapping everything OR I will get nothing but the header.  Neither is
what I want.

Whenever I have come across the need to trucate HTML, I have always
been able to just do a strip-tags first.  Most of the time I am just
trying to display a "summary" of a larger HTML formatted page/document
and losing formatting for summary purposes is usually not that big of
a deal.

Is there a possible need/use for truncating HTML and leaving it well
formed, maybe.  Is it a trivial enough implementation to put in a web-
helper, not IMO.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to