Lee.M wrote:
> On Apr 10, 2009, at 10:58 AM, Lee.M wrote:
> 
>> Along w/ the problem of unbalancing tags there is also the white space
>> issue (e.g. you want 100 characters you could have 'a' .
>> $ninety_five_spaces . 'b' . $tons_of_text. and the truncated verbiage
>> is essencially 'a b'
>>
>> length of character entities (e.g < == 1 character not 4)
>>
>> Fortunately it looks like someone has already addressed all of that
>> for us :)
>>
>> http://search.cpan.org/perldoc?HTML::Truncate
> 
> Also, if you want to go the opposite route of just making it plain text:
> 
> http://search.cpan.org/perldoc?HTML::Obliterate

For stripping down to text, I usually prefer to use HTML::Parser and something 
like their example script:
http://cpansearch.perl.org/src/GAAS/HTML-Parser-3.60/eg/htext

HTML::Parser can usually handle improper HTML better at the expense of speed.

HTML::Strip seems like a good alternative to the HTML::Obliterate mentioned 
above as well:
http://search.cpan.org/~kilinrax/HTML-Strip-1.06/Strip.pm

HTML::Strip is wrote in XS and says it's about 5 times quicker than regexp. 
Whether that's true or not is up to someone else to test.

Thanks for the HTML::Truncate suggestion. I may end up checking that out in the 
future.

-- Josh

_______________________________________________
templates mailing list
[email protected]
http://mail.template-toolkit.org/mailman/listinfo/templates

Reply via email to