On Mon, May 10, 2004 at 04:45:55PM +0100, Nick Ing-Simmons wrote: : Larry Wall <[EMAIL PROTECTED]> writes: : > : >Right now, the meaning of "text" is subject to severe distortions : >due to legacy issues. But in the long run, "text" is going to mean : >Unicode, and that probably means a UTF-8 file encoding at least in : >the western world, : : Microsoft seem to be somewhat focused on some 16-bit form.
Yeah, well, they've never minded if you have to buy a new computer to run their new software... :-) : This thread started as complaint that perl5 can't read a : script saved as UCS-2/UTF-16 or whatever Windows uses. That's why I said "probably". And I probably should have said "hopefully" instead. :-) But my main point was that "text" will eventually mean "Unicode", whether or not that means "UTF-8". (I probably should have parenthesized the two subthoughts about what will end up the default where.) Really, though, once you've guaranteed a Unicode view at the appropriate input boundaries, the differences between the various UTFs should be fairly insignificant from a language point of view, provided you maintain the abstractions. The Perl 5 engine unfortunately doesn't provide quite enough abstraction power to pull it off. We're hoping to do a better job of pulling it off with Perl 6, but that implies a more strongly typed string implementation underneath than Perl 5 provides. Perl's always been about providing reasonable defaults, and will continue to do so. But changing what's reasonable is tricky, and sometimes you have to go through a period in which nothing can be considered reasonable. Larry