Nick Ing-Simmons <[EMAIL PROTECTED]> writes: > > You can use t/table.euc under Jcode module for instance. table.utf8 > > in my code example is just a utf8 version thereof. That's a data which > > contains all characters defined in EUC (well, actually JISX0212 is not > > included but very few environments can display JISX0212). > >It is realy great to have some valid data! > >For a start it has found a bug in :encoding layer - knew there must be some... >(I think I have rediscovered the multi-byte char spanning buffer boundary >bug ... which I could not reproduce before)
That is it - :encoding needs some serious re-work for any encoding which will winge about partial characters (8-bit never does, and 16-bit is unlikely to with even-length buffers - but multi-bytes can. But since layers are much more stable now it can be recoded in a better manner anyway. To do that it needs to know why encode/decode stopped - did they "fail" or just "pause" ? So ->decode and ->encode methods are going to get tweaked as hinted at in the existing pod. I am currently leaning towards allowing "check" to be a reference something like : $uni = $enc->decode($octets); # best attempt + replacement chars $uni = $enc->decode($octets,0); # croak on error ? $uni = $enc->decode($octets,1); # stop on error $uni = $enc->decode($octets,\$err); # stop on error reason code in $err $uni = $enc->decode($octets,\&foo); # Call foo on error - protocol TBD I need to think through a sane set of "numeric" check options perhaps a "mask" of which errors are croak/replace/stop/ignored ? I think you can deduce something from return value as well, e.g. returns +ve length but does not consume whole string then that is result so far. TO find out why call it again - undef means no representation - defined but zero length means partial char - +ve length meant we had run out of room (does not occur at perl level as SV can grow...) -- Nick Ing-Simmons http://www.ni-s.u-net.com/