On Oct 2, 2006, D Jungk wrote

I use a shell command, "File" on Linux, and since Macintosh runs on Unix, I imagine something similar might work.

On Oct 3, 2006, at 10:24 AM, Phil M wrote:

You can use a validator for the Unicode text, and it should tell you which Unicode type it is (UTF8, UTF16BE, or UTF16LE).

With some further testing I find that even though I tell my program that the incoming file is UTF8 it seems to read and process the file just fine so long as it is anything other than UTF16 (MacRoman, ISOLatin1, WindowsLatin1 etc). This is probably because the file consists of pretty much all characters that would fall in the normal ASCII character set and so everything is being handled fine as single bytes. My program looks for a particular sequence of characters in the very first record and the only time this fails is when I receive the file in UTF16 format, so, I can look for those characters using the encoding of UTF8 and if I don't find them then try again using UTF16 and things should work okay. I'll just cross my fingers that the info that comes in the file stays within the ASCII range.

It's just that no matter what I throw at BBEdit, it seems to be able to ascertain the encoding of the file. I tried the Unix "File" command and it will properly return that the file is UTF8 or UTF16 but only if the BOM characters appear at the beginning of the file and that is not always the situation in my case. With files floating all over the world I can't imagine that everyone always knows the encoding of a file they are receiving so as to be able to code their programs accordingly.

=== A Mac addict in Tennessee ===

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to