On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office) eli@logodial.comwrote:
Dear PHP Gurus,
I have a debate on the following please let me know what is true / false.
I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found as
part of my PHP 5.3 installation.
This function checks if the file start with the 3 UTF-8 BOM bytes.
However another guy told me that there is way to detect if a file is a
UTF-8 without having the BOM at the file start.
To me it sounds impossible since if you do not have this indication you
have a stream of bytes that you can never tell 100% if that is UTF-8 or
else.
Who is rigt here ?
If there is a Magical function that can detect files without a BOM if they
are UTF-8 or not please share you knowledge if this
is not a NULL or impossible function as I thought.
Here's a great write-up I've got bookmarked (he points out Windows Notepad
automatically determines the encoding):
http://codesnipers.com/?q=node/68
- If it's an XML file, the structure allows you determine the encoding.
- For other files, you can encode it as UTF-8 and look for improper
encodings.
As far as a PHP function that already does this, I'm not aware of it, but
you could make a system call to file if your on Linux, as it tries to
automatically determine the encoding:
http://linux.die.net/man/1/file
Adam
--
Nephtali: A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com