On 10/7/10 7:59 PM, Bob Sneidar wrote:
Okay, so that begs the question, if there is no difference between UTF8 and 
ASCII, why make the distinction? I mean, what would be the point to converting 
from ASCII to UTF8 or vis versa if the results were always the same?

Just being practical.

Some of us grew up in Britain in the 60s and 70s (Oh, how depressing) and remember the feeling of moving from short trousers to long trousers; as far as I understand ASCII and UTF8 are somehow the same without the place being trashed by the . . . . . (whoops, no politics) . . . those of you who want to understand my reference should watch "Carry On At Your Convenience"; a light, easily digestible introduction to the politics of the early 70s.

Bob


On Oct 6, 2010, at 1:29 PM, Jeff Massung wrote:

On Wed, Oct 6, 2010 at 3:23 PM, Richard Gaskin
<[email protected]>wrote:

I have an app that needs to auto-detect Unicode and plain text, and render
them correctly based on that auto-detection.

I have the UTF16 stuff working, but with UTF8 I have a problem:  there is
no BOM to let me know if it's Unicode, and some plain text files will
occasionally have high-ASCII values in them (like the dagger symbol).

What patterns should I be looking for in the binary data of a file to
distinguish UTF8 from plain text?


Sorry, Richard, but I believe you are out of luck here. The idea behind UTF8
is that it's indistinguishable from ASCII (0-127). You may be able to scan
the files, and if they are large enough, try and deduce some thing from them
to know which they are. For example:

On Windows, "\r\n" (13, 10) should terminate lines. Could very well be a
text file.

In ASCII there will never be a NULL terminator anywhere (byte 0). There's
likely many 0-byte values in any appreciably large Unicode file. This would
also be true of byte 8 (backspace) and byte 7 (the bell) and probably a few
others.

If the number of bytes that have the high bit (0x80) set is extremely low
(<<<  1%) then most likely it's ASCII.

HTH,

Jeff M.
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution
_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to