On Sun, Oct 12, 2008 at 7:19 PM, Hans Zaunere <[EMAIL PROTECTED]> wrote:
> Gentlemen,
>
>> > The safest approach is probably to pass the html through tidy, and
>> > then into DOM, and traverse and count the length of text nodes, but
>> > that would be quite slow if you ran it on every request.
>>
>> Right, +1 for Tidy and DOM, it's the "real" way to do it. You won't
>> need to do it on every request -- you can either store the summary
>> itself as a separate text field, or store the length of the summary as
>> an integer.
>
> I tried this, working through using both DOM and Tidy, and combinations of 
> each - no luck.  The problem is getting the differential between the two 
> versions of the text.
>

This is a solvable problem, but the problem needs to be really well
defined.  I assume you want to snip the html, to show a preview.  If
you leave things like youtube videos and images, then the post could
be really long without much text.  Why do you need the differential
between the two versions?  As soon as you pass something through tidy,
getting the differential is impossible because it can change the html
in unpredictable ways.  Not cutting in the middle of a tag is pretty
easy to solve, just iterate and keep track of the open tags on a
stack.

-John Campbell
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to