Re: BOM and principle of least surprise

Nick Ing-Simmons Mon, 26 Apr 2004 08:51:43 -0700

Erland Sommarskog <[EMAIL PROTECTED]> writes:
>Nick Ing-Simmons ([EMAIL PROTECTED]) writes:
>> Erland Sommarskog <[EMAIL PROTECTED]> writes:
>>>I would really expect someone to have done this already, but I see no
>>>reference to such a module. Or layer-directive like "<:use-bom" to open
>>>the file. And then some way to open an output file "same mode as that
>>>handle". 
>> 
>> Seems you are the 1st (at least to care) - so in true OpenSource 
>> spirit you would write the module and contribute it.
>
>Unfortunately my field of expertise is not in the area of C++ programming
>or Perl internals. Believe me, you would not want to see my miserable
>code entered into the Perl code base. :-)


Well you only learn by trying - but that is your choice.

>
>I guess, that if I want to write a utility which can handle Unicode 
>files, that I will implement the file-opening in Perl in some private
>module.

That would be a resonable way to prototype stuff for core anyway.
With perl5.7+'s "layers" it should be possible to do this as module.
(Which was at least part of motivation for inventing them.)

> 
>> Many _programs_ yes. So when you write a perl _program_ you can 
>> handle it. C++ language doesn't do this for you, why should Perl?
>> Now there may well be a C++ _library_ which does this, so there 
>> could be a perl _library_ (module) which did it too.
>
>But Perl is not C++. C++ is a strongly typed language where you use
>different functions for 8-bit and Unicode data. Perl is also a higher-
>level language that does more work for me. 

But there is a limit - or there would be just one perl program:

#!/usr/bin/perl
exit(do_what_I_mean(@ARGV));

>I'd say that it would be
>perfectly in the spirit of Perl to magically handle file as ASCII or
>Unicode without me having to bother.

Agreed - but magic doesn't create itself.

> 
>> It would seem best place to do this would be to change 
>> the initial layer in Win32 to a new layer (say :bomcrlf).
>> This layer would get popped on binmode() - fixing above.
>> It would look at 1st few bytes it got from OS and then if it was 
>> a BOM push an encoding() layer beneath itself and mutate into 
>> a :crlf layer with UTF8 flag set.
>
>Yes, that sounds like a good way that would ensure compatibility and
>still give me what I want. When is Santa coming to town? :-)

Implied timescale sounds viable ;-)

>
>However, that does not really help when the Perl script itself is in
>UTF-16 or UTF-8.

Yes it does - I _think_ one or more of 

perl -MWin32BOM UTF-16_script

or 

set PERL5OPT -MWin32BOM

or 

set PERLIO bomcrlf  
(with magical autoload) 

could be made to work.

If it happens in core-perl it can certainly work. 

> 
>Anyway, thanks for all the replies. This is not really a big deal for
>me at the moment. I was just puzzled by the results of my tests. Since
>I working with a module that will support Unicode data, I'm a little
>nervous that I will get questions from users about the topic.

Re: BOM and principle of least surprise

Reply via email to