[PHP] a Debate here - How can you check a if a file is a UTF-8 without the BOM using PHP ?

2011-05-21 Thread Eli Orr (Office)


Dear PHP Gurus,

I have a debate on the following please let me know what is true / false.

I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found 
as part of my PHP 5.3 installation.

This function checks if the file start with the 3 UTF-8 BOM bytes.

However another guy told me that there is way to detect if a file is a 
UTF-8 without having the BOM at the file start.
To me it sounds impossible since if you do not have this indication you 
have a stream of bytes that you can never tell 100% if that is UTF-8 or 
else.


Who is rigt here ?
If there is a Magical function that can detect files without a BOM if 
they are UTF-8 or not please share you knowledge if this

is not a NULL or impossible function as I thought.

Many thanks for you wise advise.

--
Best Regards,

*Eli Orr*
*LogoDial Ltd.*
Email: _Eli.Orr@LogoDial.com_
Skype: _eliorr.com_


Re: [PHP] a Debate here - How can you check a if a file is a UTF-8 without the BOM using PHP ?

2011-05-21 Thread Adam Richardson
On Sat, May 21, 2011 at 12:10 PM, Eli Orr (Office) eli@logodial.comwrote:


 Dear PHP Gurus,

 I have a debate on the following please let me know what is true / false.

 I'am using a PHP function *is_UTF_8_file ($file_name) *that I've found as
 part of my PHP 5.3 installation.
 This function checks if the file start with the 3 UTF-8 BOM bytes.

 However another guy told me that there is way to detect if a file is a
 UTF-8 without having the BOM at the file start.
 To me it sounds impossible since if you do not have this indication you
 have a stream of bytes that you can never tell 100% if that is UTF-8 or
 else.

 Who is rigt here ?
 If there is a Magical function that can detect files without a BOM if they
 are UTF-8 or not please share you knowledge if this
 is not a NULL or impossible function as I thought.


Here's a great write-up I've got bookmarked (he points out Windows Notepad
automatically determines the encoding):
http://codesnipers.com/?q=node/68

   - If it's an XML file, the structure allows you determine the encoding.
   - For other files, you can encode it as UTF-8 and look for improper
   encodings.


As far as a PHP function that already does this, I'm not aware of it, but
you could make a system call to file if your on Linux, as it tries to
automatically determine the encoding:
http://linux.die.net/man/1/file

Adam

-- 
Nephtali:  A simple, flexible, fast, and security-focused PHP framework
http://nephtaliproject.com