Jarkko Hietaniemi ([EMAIL PROTECTED]) writes: >> I said the principle of least surprise, because having read Perluniintro >> my impression was that I should really have to care in which format the >> string was in. > > You should not need to care *once* the data has been read into Perl. > > Before that, in the input phase, Perl needs your help.
That is quite a serious restriction. It's like having a great car that can make 200 km/h on the motorway, but you have to push it in a downhill to start it. In many situations, I as an author of a Perl script have no idea of what the input might be in. I might be writing a general tool that reads some file, and performs some processing. I have no idea of what input the user might be feeding me. And putting the burden on the user is not a good solution. Not in an environment where he never has to bother with other tools. It seems that the only way out, is to first open the file in plain mode, look at the first three bytes, and if it is BOM, close the file, open again with the appropriate options and discard the BOM. I would really expect someone to have done this already, but I see no reference to such a module. Or layer-directive like "<:use-bom" to open the file. And then some way to open an output file "same mode as that handle". Including correct handling of newline in UTF-16LE. > If you have a stream of bytes Perl cannot start blindly guessing > what data it might be. Why not? It seems that many programs on Windows does precisely this. > If from a BOM Perl should guess that input is in UTF-16, that would make > it impossible to read the same file in as binary. Not sure I understand. Could you elaborate? > (Perl does recognize BOMs in Perl scripts, since it has to kind of its > known format...) Again, I don't understand what you are talking about. Maybe it is that funny !# you use on Unix, but as I said, I mainly work on Windows. In any case, I fail to see that first looking for a BOM, and then for !# once you have deduced the encoding would be impossible. Is there at all any possibility to feed Perl a script that has been saved in UTF-16? (Which is how you normally save Unicode on Windows.) -- Erland Sommarskog, Stockholm, [EMAIL PROTECTED]