From: Axel Mock
Sent: 6/19/2006 7:59:48 AM
To: [EMAIL PROTECTED]
Cc: activeperl@listserv.ActiveState.com
Subject: Re: auto-detecting file encoding
> Hi,
>
> i was just reading this thread concerning detecting/guessing Unicode, while I
> was debugging
> my little module that, among
Hi,
i was just reading this thread concerning detecting/guessing Unicode, while I
was debugging
my little module that, among other file releated things, should read in some
file, convert it to
internal UTF8.
Things I came across:
Encode::Guess was obviously written with non-UTF input in mind
Hello:
The problem is that Unicode files do not have contain the byte mark
header, and in fact, the ones I'm attempting to decode, don't. As I
said before, most of them are Windows-1252, some of them are Latin-1,
and still some are UTF-8 -- without a special byte mark. The problem
is that o
On Jun 18, 2006, at 22:06, Jerry Yang wrote:
Hi,
The file in UTF-8 should have a BOM like this "EF BB BF"
Bytes Encoding Form 00 00 FE FF UTF-32, big-endian FF FE 00 00
UTF-32,
little-endian FE FF UTF-16, big-endian FF FE UTF-16, little-endian
EF BB
BF UTF-8
Should, but don't have t
Hi,
in Windows has each unicode file a special header. The following headers
are in use:
UTF-16: \xFF\xFE
UTF-16BE: \xFE\XFF
utf8: \xEF\xBB\xBF
For a automatic check, open the file in binary mode, read the first 3 bytes
and compare it with the given pattern. If it is not matching the patterns,
win
Hi, The file in UTF-8 should have a BOM like this "EF BB BF"Bytes
Encoding Form
00 00 FE FF
UTF-32, big-endian
FF FE 00 00
UTF-32, little-endian