On Thu, Jun 12, 2008 at 7:55 AM, rcs_comp <[EMAIL PROTECTED]> wrote:
>
>
>
> On Jun 12, 5:13 am, "Mike Orr" <[EMAIL PROTECTED]> wrote:
>> Although again, we have two issues.  One is HTML-to-text (essentially
>> lynx-as-a-function).  The other is truncating an HTML string while
>> keeping it well-formed (which means not stopping in the middle of a
>> tag and closing any open tags).
>
> Actually, I think we may have four issues...?
>
> 1) truncate HTML and end up with well-formed HTML.

I agree with you; I'm not convinced this is a broad enough need to
warrant a webhelper.  But some significant use cases would help
convince me.

> 2) strip all HTML tags (without an interest in text formatting)
> 3) html2text (trying to keep text formatting with p, block, etc.)

Ian's code handles p and div, and treats block as p.  Other tags are
stripped and ignored.  We can extend it if we want more sophistocated
formatting.  Actually, indented blocks would be useful.  And
optionally displaying the hrefs.  (Lynx does this with footnotes.)


> 4) sanitizing HTML (not directly discussed here, but a good
> implementation of this will be helpful, increase security, and should
> be able to be extended trivially to provide #2, striping all HTML
> tags).

What exactly do you mean by sanitizing?  Stripping all except a few
formatting tags?  This would be good for WebHelpers if somebody can
provide an implementation.  One not depending on non-stdlib packages.

-- 
Mike Orr <[EMAIL PROTECTED]>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to