Marco Baroni <[EMAIL PROTECTED]> writes: > >> Now for a much less pressing issue: Does anybody know of something > >> similar to the HTML::FormatText module that can take utf-8 input, and > >> generate utf-8 output? > > > > Doubt it. But if you run it on Unicode chars (as indicated above) > > then unless it is doing something too clever it should just work. > > > Could it be that the problem is with HTML::TreeBuilder (which is > required for pre-processing by HTML::FormatText)? Does anybody know if > this module has issues with Unicode?
It probably has as it uses HTML::Parser underneath. HTML::Parser is not really Unicode aware. The strings passed to the event callback will not preserve the UTF8 flag of strings it parses. A workaround can be to pass it encoded UTF8. I would also welcome patch suggestions that make HTML::Parser that propegates the UTF8 flag. Regards, Gisle