Marco Baroni <[EMAIL PROTECTED]> writes:

> >> Now for a much less pressing issue: Does anybody know of something
> >> similar to the HTML::FormatText module that can take utf-8 input, and
> >> generate utf-8 output?
> >
> > Doubt it. But if you run it on Unicode chars (as indicated above)
> > then unless it is doing something too clever it should just work.
> >
> Could it be that the problem is with HTML::TreeBuilder (which is
> required for pre-processing by HTML::FormatText)? Does anybody know if
> this module has issues with Unicode?

It probably has as it uses HTML::Parser underneath.  HTML::Parser is
not really Unicode aware.  The strings passed to the event callback
will not preserve the UTF8 flag of strings it parses.

A workaround can be to pass it encoded UTF8.  I would also welcome
patch suggestions that make HTML::Parser that propegates the UTF8
flag.

Regards,
Gisle

Reply via email to