On Sun, Jan 05, 2003 at 12:16:38PM -0600, Earl Hood wrote: > > > This is Bad Juju (tm). It _guarantees_ script breakage (potentially > > > silently!) for Unix people doing _anything_ but ASCII text manipulation. > > > > I repeat: I don't think you can do "more than ASCII" by hanging tooth > > and nail to the "everything is bytes" credo. > > This statement assumes someone is working with characters. It is > common for many to use regexs and other operators (substr, index, > et. al.) on binary data directly.
True. I think what I was referring to (somewhere earlier in my message) is that you won't get Unicode data mixed into your data unless you ask so, explicitly or implicitly. > > I repeat: all your filehandles are still 'binary' unless you either > > explicitly (binmode) or implicitly (locale) command them not be. > > If you try to push Unicode (data marked as UTF-8, such as characters > > beyond 255) on such a filehandle, you'll get 'Wide character' warning. > > If you do not like the locale implicit switching, reset your locale > > to something not /utf-?8/i in it before running the script. > > I think this reasoning is flawed since it assumes the author of > the script has complete control over the environment. For example, > the script can be used by others in environments the author does not > control. Therefore, older programs can quietly break, or behave > different. > > According the perllocale manpage, locale should have no effect > unless the 'use locale' pragma is specified. It appears from > Benjamin's script that he is not using the pragma, so even if the > environment has a utf-8 locale, the script should be unaffected. True, too. The enabling of UTF-8ness based on locale is an exception as to how things were done before. But I'm delegating responsibility about that decision to Larry Wall :-) I'm trying to get an opinion about this from him, and I just logged a problem ticket about this issue. > --ewh -- Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen