It might be worthwhile to investigate your UTF-16 input data file in hex before deciding what needs to be done to read it properly in Perl. Presumably, if you'll have lots of files of this flavor, they'll be consistent in relevant details, so you only need to check one at the outset, to understand what's really going on. Does the file have line terminations like this:
0d 00 0a 00 <CR> <LF> Also, if you are using Perl to write UTF-16 data to a file handle, you'll only get the BOM (and only your machine's _native_ byte order) when you specify the encoding as "UTF-16". If you say "UTF-16LE", you override your machine's native byte order (if necessary), and you don't get a BOM unless you explicitly write it yourself. As for line termination patterns on output, you probably need to control that separately, either by setting "$\" or using the ":crlf" IO-layer. (Are you trying to write platform-independent code, or are you just trying to cope with a specific plaform?) As for the code you posted at the top of this thread, note that "\x{fffe}" is the code point for "no such character" -- i.e. it is the one code point that is specifically left undefined/unassigned/unused so that the BOM code point "\x{feff}" will always work the way it is supposed to. The "\x{HHHH}" notation in perl refers to code points, not 16-bit encodings of characters. To write a correct BOM, you have to use "\x{feff}", no matter what your output encoding layer may be. There are other things I would suggest changing in the code you posted, like improving the way error conditions are handled, using "slurp" mode for reading the input data, and fixing the regex substitution, which looks pretty broken (BOM is wrong, captured strings are deleted rather than being included in the substitution string). David Graff