Re: Programatically determining encoding of a file

Harrie Westphal Tue, 03 Oct 2006 10:24:35 -0700

On Oct 2, 2006, D Jungk wrote

I use a shell command, "File" on Linux, and since Macintosh runs onUnix, I imagine something similar might work.


On Oct 3, 2006, at 10:24 AM, Phil M wrote:

You can use a validator for the Unicode text, and it should tellyou which Unicode type it is (UTF8, UTF16BE, or UTF16LE).

With some further testing I find that even though I tell my programthat the incoming file is UTF8 it seems to read and process the filejust fine so long as it is anything other than UTF16 (MacRoman,ISOLatin1, WindowsLatin1 etc). This is probably because the fileconsists of pretty much all characters that would fall in the normalASCII character set and so everything is being handled fine as singlebytes. My program looks for a particular sequence of characters inthe very first record and the only time this fails is when I receivethe file in UTF16 format, so, I can look for those characters usingthe encoding of UTF8 and if I don't find them then try again usingUTF16 and things should work okay. I'll just cross my fingers thatthe info that comes in the file stays within the ASCII range.

It's just that no matter what I throw at BBEdit, it seems to be ableto ascertain the encoding of the file. I tried the Unix "File"command and it will properly return that the file is UTF8 or UTF16but only if the BOM characters appear at the beginning of the fileand that is not always the situation in my case. With files floatingall over the world I can't imagine that everyone always knows theencoding of a file they are receiving so as to be able to code theirprograms accordingly.


=== A Mac addict in Tennessee ===

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: Programatically determining encoding of a file

Reply via email to