Dan Kogai <[EMAIL PROTECTED]> writes: >First, thank you all for perl@14503. > >On 2002.01.30, at 07:07, Nick Ing-Simmons wrote: >> If I run the compile script on it and build Encode::EUC_JP >> as an XS extension and change Encode::Tcl to .... > > I also made Encode::JP::SHIFTJIS, with Encode::EUC_JP as a template >(Also relocated Encode::EUC_JP to Encode::JP:EUC_JP) and it also >worked. I have a feeling this will work for other CJK. > Now the problem is escape-based codings such as ISO-2022.
Can you explain the way those work? I can imagine two ways for decode: A - keep going with current sub-encoding till we get a fail, then look at next few octets for an escape sequence. B. - Scan ahead for next escape sequence (or end of available input) then translate up to that. A. Is easy - but as all escape sequences seem to be valid ASCII does not work. B. requires an irritating double scan. For encode there is a different pain. For each code point we need an efficent way to find out whether a sub-encoding can represent that point. A bit map of 0x10FFFF entries does not seem good, so it is either an auxillary table, or try-it-and-see (which should not be too bad with C version). > Another small problem is that XS-based encoding consumes a whole >directory immediately under perl/ext/Encode. Well, I can live with a >few dozens more. You could bundle several encodings in one XS (the way Encode itself bundles ASCII, ios-8859-* and koi8). If any of the bundled encodings have similar sequences of code points then we will get overall table size reductions too. In the limit one could have Encode::CJK, but perhaps Encode::JP / Encode::CN / Encode::KR makes more sense ??? > And the speed of the compile script may be a problem if we want all >CJK to be XS-based. It roughly takes about 25 seconds to compile single >CJK encoding on my FreeBSD box. Well, I can live with that too but >other porters may find it frustrating.... We could ship things pre-compiled (with origianal .ucm's gzipped, or provide a way to extract a .ucm from the compiled form). Also the compile process is all in perl and has not really been tunned. It spends a lot of time trying to find common "strings" (which gets tables down in size so is a win.) > I think we are making a significant progress in CJK.... > >Dan -- Nick Ing-Simmons http://www.ni-s.u-net.com/