On Thu, Jan 31, 2002 at 04:19:23AM +0900, Dan Kogai wrote: > And the speed of the compile script may be a problem if we want all > CJK to be XS-based. It roughly takes about 25 seconds to compile single > CJK encoding on my FreeBSD box. Well, I can live with that too but > other porters may find it frustrating....
Now I've re-read this message I've just noticed that paragraph. I did get frustrated with it. 1: It's too slow 2: It uses too much RAM. (Well, that's subjective, but my FreeBSD box only has 16M total, and it was not a happy bunny, swapping like crazy and taking over an hour to run 5 minutes worth of CPU time) So I've been re-jigging it (and Jarkko has been commiting the improvements) to bleadperl - not sure if you're subscribed to p5p. By yesterday I think it was 37% faster at compiling EUC_JP, and I've found some more things to tweak today. [eg just found that using (unpack "n*", pack "H*", $line) makes it 2.5% faster than (map {hex $_} $line =~ /(....)/g) I think that that is portable to big endian, and to 64 bit] I hope that I've not been tramping on things you've been doing. It's still making output files that are byte-for-byte identical with what the original of last week did. I've got a question about FFFD. The original compile script does this: for (my $j = 0; $j < 16; $j++) { no strict 'refs'; my $ech = &{"encode_$type"}($ch,$page); my $val = hex(substr($line,0,4,'')); next if $val == 0xFFFD; if ($val || (!$ch && !$page)) { my $el = length($ech); $max_el = $el if (!defined($max_el) || $el > $max_el); $min_el = $el if (!defined($min_el) || $el < $min_el); my $uch = encode_U($val); if (exists $seen{$uch}) { warn sprintf("U%04X is %02X%02X and %02X%02X\n", $val,$page,$ch,@{$seen{$uch}}); } else { $seen{$uch} = [$page,$ch]; } enter($e2u,$ech,$uch,$e2u,0); enter($u2e,$uch,$ech,$u2e,0); } else { # No character at this position # enter($e2u,$ech,undef,$e2u); } $ch++; } Is there a bug? Should the $ch++ happen even for the cases where $val == 0xFFFD? Currently it looks like $ch is not incremented when the input value is 0xFFFD Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html