On Aug 19, 2013, at 1:29 PM, J. Landman Gay wrote: > On 8/19/13 2:15 PM, Devin Asay wrote: >> >> On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote: >> >>> I need to read and process a tab-delimited text file that is in >>> UTF8 format containing unicode. The final goal is to get it into an >>> array with the first tabbed item as the keys, preserving all >>> unicode. There are some HTML format tags in it as well. >>> >>> If I read the file as binfile, carriage returns are all lost. >> >> Jacque, >> >> Where are the files coming from? Maybe they're using ASCII 13 as a >> line terminator, or ASCII 10 + 13. Can't you replace whatever the >> native line delimiter is with numToChar(10)? > > I forgot about that. They're ascii 13, and replacing them does keep the line > breaks. Thanks. > > When I run uniEncode(tData,"UTF8") on it, the high-ascii characters are in > the variable watcher as "+" and an unprintable box. Can I assume the real > character is in there? Will it work for text chunking, etc? When I split it > into an array, will the keys be intact?
I would do all of the chunking and splitting before you do uniEncode. Think of UTF8 as a reliable storage format, and only convert them when you are ready to display them. Devin Devin Asay Learn to code with LiveCode University http://university.livecode.com _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode