On Sun, Jan 05, 2003 at 12:16:38PM -0600, Earl Hood wrote:
> > > This is Bad Juju (tm). It _guarantees_ script breakage (potentially
> > > silently!) for Unix people doing _anything_ but ASCII text manipulation.  
> > 
> > I repeat: I don't think you can do "more than ASCII" by hanging tooth
> > and nail to the "everything is bytes" credo.
> 
> This statement assumes someone is working with characters.  It is
> common for many to use regexs and other operators (substr, index,
> et. al.) on binary data directly.

True.  I think what I was referring to (somewhere earlier in my
message) is that you won't get Unicode data mixed into your data
unless you ask so, explicitly or implicitly.

> > I repeat: all your filehandles are still 'binary' unless you either
> > explicitly (binmode) or implicitly (locale) command them not be.
> > If you try to push Unicode (data marked as UTF-8, such as characters
> > beyond 255) on such a filehandle, you'll get 'Wide character' warning.
> > If you do not like the locale implicit switching, reset your locale
> > to something not /utf-?8/i in it before running the script.
> 
> I think this reasoning is flawed since it assumes the author of
> the script has complete control over the environment.  For example,
> the script can be used by others in environments the author does not
> control.  Therefore, older programs can quietly break, or behave
> different.
>
> According the perllocale manpage, locale should have no effect
> unless the 'use locale' pragma is specified.  It appears from
> Benjamin's script that he is not using the pragma, so even if the
> environment has a utf-8 locale, the script should be unaffected.

True, too.  The enabling of UTF-8ness based on locale is an
exception as to how things were done before.  But I'm delegating
responsibility about that decision to Larry Wall :-)
I'm trying to get an opinion about this from him, and I just logged
a problem ticket about this issue. 

> --ewh

-- 
Jarkko Hietaniemi <[EMAIL PROTECTED]> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen

Reply via email to