Re: [Israel.pm] Unicode un-handling

Shmuel Fomberg Fri, 11 Apr 2008 05:10:35 -0700

Mikhael Goikhman Wrote:

> Is there any practical unsolvable problem to always work with non utf-8
> flagged data only (input from or output to file, socket, cgi, db, other
> modules)? And whenever you need to operate on multibyte characters you
> may write a function for each such case, for example "trim" or "cut" that
> does "decode_utf8", then regexp or "substr", then "encode_utf8" back. And
> if you like, your function may also support both cases (using _is_utf8)
> and return the output in the same manner (with or without utf8 flag).


Well, that's how it works right now. I'm just worried that Template 
Toolkit will get confuse handling utf8 data as latin1 data. But that's a 
very unlikely.

Don't forget that doing it this way will introduce weird characters 
everywhere. Theoretically, a Hebrew char can be 0x5D + 0x10. and then 
suddenly you have \r in you stream and weird things happens.

And now we have the question whether all the modules can handle 
weird/control chars in the text, or just go all the way and treat it as 
binary.

I'll go test a few modules...

Shmuel.
_______________________________________________
Perl mailing list
[email protected]
http://perl.org.il/mailman/listinfo/perl

Re: [Israel.pm] Unicode un-handling

Reply via email to