On Jun 12, 5:13 am, "Mike Orr" <[EMAIL PROTECTED]> wrote:
> Although again, we have two issues. One is HTML-to-text (essentially
> lynx-as-a-function). The other is truncating an HTML string while
> keeping it well-formed (which means not stopping in the middle of a
> tag and closing any open tags).
You might also want to look here:
http://www.zope.org/Members/chrisw/StripOGram
http://www.gnome.org/~jdub/bzr/planet/2.0/planet/sanitize.py
My $0.02 is that truncating HTML while ensuring it is well-formed is
not something that should be spent time on implementing in a web
helper. Take this example for instance:
<h1>My Page Subject</h1>
<div>
<p>Lorem Ipsum...[another 200 characters]</p>
<p>Lorem Ipsum...[another 200 characters]</p>
<p>Lorem Ipsum...[another 200 characters]</p>
<p>Lorem Ipsum...[another 200 characters]</p>
<p>Lorem Ipsum...[another 200 characters]</p>
</div>
Lets say that I want the first 150 characters, what is going to
happen? I am going to get 1000+ characters b/c of the <div> that is
wrapping everything OR I will get nothing but the header. Neither is
what I want.
Whenever I have come across the need to trucate HTML, I have always
been able to just do a strip-tags first. Most of the time I am just
trying to display a "summary" of a larger HTML formatted page/document
and losing formatting for summary purposes is usually not that big of
a deal.
Is there a possible need/use for truncating HTML and leaving it well
formed, maybe. Is it a trivial enough implementation to put in a web-
helper, not IMO.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---