Hi friends, Am 16.01.2013 um 18:15 schrieb Nishok Love <nishok.l...@virgin.net>:
> ... > So I'm still looking for a way for LiveCode to spot whether it's opening a > file in UTF-8 or UTF-16 (or something else - aaarrgh!). Can I access the file > header? read from file just gives me the data... I found an old script that Mark Waddingham supplied in the past when I had some problems reading VCards in 3.0 format (unicode). I think it can be used to open ANY txt file. I do not fully understand it, so I leave it uncommented ;-) In any case it will convert any given text file to Livecode readable plain text. Comments are from Mark W. > I could read the file, count the number of characters and how many of them > are spaces and from that I could infer which format is being used. Probably > this would be reliable for my purposes - just not very elegant! > > Nishok ############################################################################### -- vCards are stored as a text file, however, the text encoding used varies -- depending on the program that exported them. -- -- We use the following heuristic to detect encoding: -- 1) If there is the byte order mark 0xFEFF then we assume UTF-16BE -- 2) If there is the byte order mark 0xFFFE then we assume UTF-16LE -- 3) If the first byte is 0x00 then we assume UTF-16BE (compatibility -- with Tiger Address Book) -- 4) Otherwise we assume UTF-8 -- function importVCard pFilename -- First load the vCard as binary data - at this stage we don't know -- the text encoding of the file and loading as text would cause -- inappropriate line ending conversion. local tBinaryVCard put url ("binfile:" & pFilename) into tBinaryVCard -- This variable will hold the vCard encoded in MacRoman (the default -- text encoding Revolution uses on Mac OS X) local tNativeVCard -- We now do our checks to detect text encoding local tTextEncoding if charToNum(char 1 of tBinaryVCard) is 0 then put "UTF16BE" into tTextEncoding else if charToNum(char 1 of tBinaryVCard) is 0xFE and charToNum(char 2 of tBinaryVCard) is 0xFF then delete char 1 to 2 of tBinaryVCard put "UTF16BE" into tTextEncoding else if charToNum(char 1 of tBinaryVCard) is 0xFF and charToNum(char 2 of tBinaryVCard) is 0xFE then delete char 1 to 2 of tBinaryVCard put "UTF16LE" into tTextEncoding else put "UTF8" into tTextEncoding end if if tTextEncoding begins with "UTF16" then -- Work out the processors byte order local tHostByteOrder if the processor is "x86" then put "LE" into tHostByteOrder else put "BE" into tHostByteOrder end if -- If the byte orders don't match, switch the order of pairs of bytes if char -2 to -1 of tTextEncoding is not tHostByteOrder then repeat with x = 1 to the length of tBinaryVCard step 2 get char x of tBinaryVCard put char x + 1 of tBinaryVCard into char x of tBinaryVCard put it into char x + 1 of tBinaryVCard end repeat end if -- Decode the UTF-16 to native put uniDecode(tBinaryVCard) into tNativeVCard else -- Use the standard uniDecode/uniEncode pair to decode the UTF-8 encoding put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard end if -- We now need to normalize line endings to make sure all lines terminate -- in 'return' (numToChar(10)). local tTextVCard put tNativeVCard into tTextVCard -- First replace Windows CR-LF style endings replace numToChar(13) & numToChar(10) with return in tTextVCard -- Now replace Mac OS CR style endings replace numToChar(13) with return in tTextVCard return tTextVCard end importVCard -- The Tiger version of Apple Address Book (4.0.4) exports vCard files -- as UTF-16 big endian without a BOM if the record contains any non-ASCII -- characters. -- If there are non non-ASCII characters, the record is just left as -- ASCII with no conversion to UTF-16. -- On Leopard, it seems that Apple Address Book exports vCard files -- in UTF-8 regardless. function importAppleAddressVCard pFilename -- First load the vCard as binary data - at this stage we don't know -- the text encoding of the file and loading as text would cause -- inappropriate line ending conversion. local tBinaryVCard put url ("binfile:" & pFilename) into tBinaryVCard -- This variable will hold the vCard encoded in MacRoman (the default -- text encoding Revolution uses on Mac OS X) local tNativeVCard -- Okay so now we have the binary data, we need to decide if it is -- UTF-16BE or ASCII/UTf-8. This is easy to do since the first character of -- a vCard has to be an ASCII character. If the record has been encoded -- as UTF-16BE, then this means this will translate as the first byte -- being the NUL (0) character. if charToNum(char 1 of tBinaryVCard) is 0 then -- We are UTF-16BE -- We now know that tBinaryVCard is big endian UTF-16 since Revolution -- only handles host byte order UTF-16 at the moment we must byte-swap -- on Little Endian platforms if the processor is "x86" then repeat with x = 1 to the length of tBinaryVCard step 2 get char x of tBinaryVCard put char x + 1 of tBinaryVCard into char x of tBinaryVCard put it into char x + 1 of tBinaryVCard end repeat end if -- We have UTF-16 in host byte order now, so use uniDecode to convert -- it to MacRoman put uniDecode(tBinaryVCard) into tNativeVCard -- We now have MacRoman text, but it still has Mac line endings, so -- replace CR with return else -- We are ASCII or UTF-8. Fortunately, as ASCII is a proper subset of -- UTF-8 we can just assume we have UTF-8 and convert this to native -- encoding put uniDecode(uniEncode(tBinaryVCard, "UTF8")) into tNativeVCard end if -- We now need to normalize line endings to make sure all lines terminate -- in 'return' (numToChar(10)). local tTextVCard put tNativeVCard into tTextVCard -- First replace Windows CR-LF style endings replace numToChar(13) & numToChar(10) with return in tTextVCard -- Now replace Mac OS CR style endings replace numToChar(13) with return in tTextVCard return tTextVCard end importAppleAddressVCard ############################################################################### Best Klaus -- Klaus Major http://www.major-k.de kl...@major.on-rev.com _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode