Re: UTF-16LE fails in substitution

2005-09-23 Thread David Graff
Minor tweak to the regex substitution in that "subroutine version" that I posted previously -- it should be: $editdata =~ s/(\x{feff}?(?:<\?.*?\?>\s*)*)/$1$comment/s; (note the optional BOM at the beginning of the match pattern -- if the data contains a BOM, then $comment will be inserted a

Re: UTF-16LE fails in substitution

2005-09-22 Thread David Graff
[EMAIL PROTECTED] said: > I may be a little confused here still. The help that I included in the > first post said that "UTF-16 itself can be used for in-memory > computations, but if storage or transfer is required either UTF-16BE > (big-endian) or UTF-16LE (little-endian) encodings must be chos

Re: UTF-16LE fails in substitution

2005-09-21 Thread Steve Larson
Thanks David and Dan. Comments inline. "David Graff" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > > It might be worthwhile to investigate your UTF-16 input data file in hex > before deciding what needs to be done to read it properly in Perl. > Presumably, if you'll have lots of fi

Re: UTF-16LE fails in substitution

2005-09-17 Thread David Graff
It might be worthwhile to investigate your UTF-16 input data file in hex before deciding what needs to be done to read it properly in Perl. Presumably, if you'll have lots of files of this flavor, they'll be consistent in relevant details, so you only need to check one at the outset, to unders

Re: UTF-16LE fails in substitution

2005-09-16 Thread Steve Larson
"Dan Kogai" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Sep 15, 2005, at 07:05 , Steve Larson wrote: > > > What I want to do is add a version string comment at the beginning > > of .xml > > files. I test to see if the file is UNICODE (Encode::Unicode) or > > ASCII > > (Encode:

Re: UTF-16LE fails in substitution

2005-09-15 Thread Dan Kogai
On Sep 15, 2005, at 07:05 , Steve Larson wrote: What I want to do is add a version string comment at the beginning of .xml files. I test to see if the file is UNICODE (Encode::Unicode) or ASCII (Encode::XS) using guess_encoding. My ASCII case works fine but the regexp for the UNICODE case