I'm thinking this is a frequent requirement, such that I'm hoping someone may have a standard routine for it.
I'm dealing with basically plain text files. But basically plain text means English with a few things such as smart quotes and possibly a few of the more common western European accented characters; in other words, characters that are outside ASCII, but within Mac Roman and Windows Latin 1/ISO-8859-1. My app is getting these files from various sources; but they've all ended up on Mac. However, some are MacRoman, some are UTF-8. To date I've been loading all the files with "URL "file:...", which of course messes up with UTF-8. These particular files are UTF-8 with no BOM. I can probably code a routine to deal with this particular case; eg by opportunistically searching for the UTF-8 character sequence for a smart apostrophe that I happen to know will appear in all the instances I'm currently dealing with. But it made me wonder if there is a general algorithm^H^H^H heuristic I'd guess for recognising the encoding of a file, and whether anyone's code a general "load text file" routine that loads a file as binary, establishes the encoding, and normalises the content so that it can be called on files in Mac Roman or Windows Latin 1, UTF-8 or UTF-16 with or without BOM, etc, returning the same result (or as close as can be) in each case? If nobody has an actual routine that I can just steal, does anyone have tips for how to guess the encoding? TIA, - Ben _______________________________________________ use-revolution mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
