Hi! As we know utf8::encode() does not provide correct UTF-8 encoding and Encode::encode("UTF-8", ...) should be used instead. Also opening file should be done by :encoding(UTF-8) layer instead :utf8.
But UTF-8 strict implementation in Encode module is horrible slow when comparing to utf8::encode(). It is implemented in Encode.xs file and for benchmarking can be this XS implementation called directly by: use Encode; my $output = Encode::utf8::encode_xs({strict_utf8 => 1}, $input) (without overhead of Encode module...) Here are my results on 160 bytes long input string: Encode::utf8::encode_xs({strict_utf8 => 1}, ...): 8 wallclock secs ( 8.56 usr + 0.00 sys = 8.56 CPU) @ 467289.72/s (n=4000000) Encode::utf8::encode_xs({strict_utf8 => 0}, ...): 1 wallclock secs ( 1.66 usr + 0.00 sys = 1.66 CPU) @ 2409638.55/s (n=4000000) utf8::encode: 1 wallclock secs ( 0.39 usr + 0.00 sys = 0.39 CPU) @ 10256410.26/s (n=4000000) I found two bottle necks (slow sv_catpv* and utf8n_to_uvuni functions) and did some optimizations. Final results are: Encode::utf8::encode_xs({strict_utf8 => 1}, ...): 2 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 1223241.59/s (n=4000000) Encode::utf8::encode_xs({strict_utf8 => 0}, ...): 1 wallclock secs ( 1.68 usr + 0.00 sys = 1.68 CPU) @ 2380952.38/s (n=4000000) utf8::encode: 1 wallclock secs ( 0.40 usr + 0.00 sys = 0.40 CPU) @ 10000000.00/s (n=4000000) Patches are on github at pull request: https://github.com/dankogai/p5-encode/pull/56 I would like if somebody review my patches and tell if this is the right way for optimizations...