Lee.M wrote: > On Apr 10, 2009, at 10:58 AM, Lee.M wrote: > >> Along w/ the problem of unbalancing tags there is also the white space >> issue (e.g. you want 100 characters you could have 'a' . >> $ninety_five_spaces . 'b' . $tons_of_text. and the truncated verbiage >> is essencially 'a b' >> >> length of character entities (e.g < == 1 character not 4) >> >> Fortunately it looks like someone has already addressed all of that >> for us :) >> >> http://search.cpan.org/perldoc?HTML::Truncate > > Also, if you want to go the opposite route of just making it plain text: > > http://search.cpan.org/perldoc?HTML::Obliterate
For stripping down to text, I usually prefer to use HTML::Parser and something like their example script: http://cpansearch.perl.org/src/GAAS/HTML-Parser-3.60/eg/htext HTML::Parser can usually handle improper HTML better at the expense of speed. HTML::Strip seems like a good alternative to the HTML::Obliterate mentioned above as well: http://search.cpan.org/~kilinrax/HTML-Strip-1.06/Strip.pm HTML::Strip is wrote in XS and says it's about 5 times quicker than regexp. Whether that's true or not is up to someone else to test. Thanks for the HTML::Truncate suggestion. I may end up checking that out in the future. -- Josh _______________________________________________ templates mailing list [email protected] http://mail.template-toolkit.org/mailman/listinfo/templates
