On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office) <eli....@logodial.com>wrote:

>
> Dear PHP Gurus,
>
> I have a debate on the following please let me know what is true / false.
>
> I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found as
> part of my PHP 5.3 installation.
> This function checks if the file start with the 3 UTF-8 BOM bytes.
>
> However another guy told me that there is way to detect if a file is a
> UTF-8 without having the BOM at the file start.
> To me it sounds impossible since if you do not have this indication you
> have a stream of bytes that you can never tell 100% if that is UTF-8 or
> else.
>
> Who is rigt here ?
> If there is a Magical function that can detect files without a BOM if they
> are UTF-8 or not please share you knowledge if this
> is not a "NULL" or impossible function as I thought.
>

Here's a great write-up I've got bookmarked (he points out Windows Notepad
automatically determines the encoding):
http://codesnipers.com/?q=node/68

   - If it's an XML file, the structure allows you determine the encoding.
   - For other files, you can encode it as UTF-8 and look for improper
   encodings.


As far as a PHP function that already does this, I'm not aware of it, but
you could make a system call to "file" if your on Linux, as it tries to
automatically determine the encoding:
http://linux.die.net/man/1/file

Adam

-- 
Nephtali:  A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com

Reply via email to