Nick Ing-Simmons <[EMAIL PROTECTED]> writes:
>Encode::Tcl is too slow - even for 8-bit - which is why I wrote the
>engine which works from the "compiled" form.
>
>Have you tried using ext/Encode/compile to build an XS module for
>EUC ?
>
>>The example above on my FreeBSD box, Pentium III 800 MHz and
>>512MB RAM took some two seconds to show the result (Its performance is
>>not too bad once the internal table is full).
>
>If I had _ANY_ test data I would run the compiled test and give you
>the comparative number.

 >   You can use t/table.euc under Jcode module for instance.  table.utf8 
 > in my code example is just a utf8 version thereof. That's a data which 
 > contains all characters defined in EUC (well, actually JISX0212 is not 
 > included but very few environments can display JISX0212).

It is realy great to have some valid data!

For a start it has found a bug in :encoding layer - knew there must be some...
(I think I have rediscovered the multi-byte char spanning buffer boundary 
bug ... which I could not reproduce before)

But avoiding that with this script:

use Encode;
use Encode::Tcl;

open(my $jp,"<","table.euc") || die "Cannot open table.euc:$!";
my $text = join('',<$jp>);  
close($jp);
my $enc  = find_encoding('euc-jp');
if ($enc)
 {
  my $uni = $enc->decode($text,1); 
  if (length $text)
   {
    die "Failed to translate";
   }
  open(my $un,">:utf8","table.utf8") || die "Cannot open table.utf8:$!";
  print $un $uni;
  close($un);
 }

I get 

nick@bactrian 624$ time ../../perl -I../../lib try2
 
real    0m1.389s
user    0m1.370s
sys     0m0.020s
nick@bactrian 624$             

And file is binary identical against running linux iconv.

If I run the compile script on it and build Encode::EUC_JP
as an XS extension and change Encode::Tcl to : 

use Encode::EUC_JP;

I get 

nick@bactrian 626$ time ../../perl -I../../lib try2
real    0m0.197s
user    0m0.170s
sys     0m0.030s
nick@bactrian 626$             

Which is still worse than: 

nick@bactrian 626$ time iconv -f EUC-JP -t UTF-8 table.euc > expected
 
real    0m0.026s
user    0m0.010s
sys     0m0.020s
nick@bactrian 627$    

But IO is sub-optimal.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Reply via email to