Unicode in variables

2013-08-19 Thread J. Landman Gay
I need to read and process a tab-delimited text file that is in UTF8 format containing unicode. The final goal is to get it into an array with the first tabbed item as the keys, preserving all unicode. There are some HTML format tags in it as well. If I read the file as binfile, carriage

Re: Unicode in variables

2013-08-19 Thread Devin Asay
On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote: I need to read and process a tab-delimited text file that is in UTF8 format containing unicode. The final goal is to get it into an array with the first tabbed item as the keys, preserving all unicode. There are some HTML format tags in

Re: Unicode in variables

2013-08-19 Thread Richmond
On 08/19/2013 10:03 PM, J. Landman Gay wrote: I need to read and process a tab-delimited text file that is in UTF8 format containing unicode. The final goal is to get it into an array with the first tabbed item as the keys, preserving all unicode. There are some HTML format tags in it as well.

Re: Unicode in variables

2013-08-19 Thread Richmond
LF:Line Feed, U+000A VT: Vertical Tab, U+000B FF: Form Feed, U+000C CR:Carriage Return, U+000D CR+LF: CR (U+000D) followed by LF (U+000A) NEL: Next Line, U+0085 LS:Line Separator, U+2028 PS:Paragraph Separator, U+2029 I have a feeling that a search and replace routine

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 2:15 PM, Devin Asay wrote: On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote: I need to read and process a tab-delimited text file that is in UTF8 format containing unicode. The final goal is to get it into an array with the first tabbed item as the keys, preserving all unicode.

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 2:21 PM, Richmond wrote: LF:Line Feed, U+000A VT: Vertical Tab, U+000B FF: Form Feed, U+000C CR:Carriage Return, U+000D CR+LF: CR (U+000D) followed by LF (U+000A) NEL: Next Line, U+0085 LS:Line Separator, U+2028 PS:Paragraph Separator, U+2029 I have a

Re: Unicode in variables

2013-08-19 Thread Richmond
On 08/19/2013 10:31 PM, J. Landman Gay wrote: On 8/19/13 2:21 PM, Richmond wrote: LF:Line Feed, U+000A VT: Vertical Tab, U+000B FF: Form Feed, U+000C CR:Carriage Return, U+000D CR+LF: CR (U+000D) followed by LF (U+000A) NEL: Next Line, U+0085 LS:Line Separator, U+2028

Re: Unicode in variables

2013-08-19 Thread Devin Asay
On Aug 19, 2013, at 1:29 PM, J. Landman Gay wrote: On 8/19/13 2:15 PM, Devin Asay wrote: On Aug 19, 2013, at 1:03 PM, J. Landman Gay wrote: I need to read and process a tab-delimited text file that is in UTF8 format containing unicode. The final goal is to get it into an array with the

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 2:43 PM, Devin Asay wrote: When I run uniEncode(tData,UTF8) on it, the high-ascii characters are in the variable watcher as + and an unprintable box. Can I assume the real character is in there? Will it work for text chunking, etc? When I split it into an array, will the keys be

Re: Unicode in variables

2013-08-19 Thread Devin Asay
On Aug 19, 2013, at 1:59 PM, J. Landman Gay wrote: On 8/19/13 2:43 PM, Devin Asay wrote: When I run uniEncode(tData,UTF8) on it, the high-ascii characters are in the variable watcher as + and an unprintable box. Can I assume the real character is in there? Will it work for text chunking,

Re: Unicode in variables

2013-08-19 Thread in...@kenjikojima.com
This is unicode array. go to url http://kenjikojima.com/livecode/download/unicodeArray.livecode; I hope it helps, -- Kenji Kojima / 小島健治 http://www.kenjikojima.com/ ___ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 3:08 PM, in...@kenjikojima.com wrote: This is unicode array. go to url http://kenjikojima.com/livecode/download/unicodeArray.livecode; I hope it helps, Thank you. I will try this if I have to, but the data is very large and stepping through each character would take a long time.

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 3:07 PM, Devin Asay wrote: On Aug 19, 2013, at 1:59 PM, J. Landman Gay wrote: On 8/19/13 2:43 PM, Devin Asay wrote: When I run uniEncode(tData,UTF8) on it, the high-ascii characters are in the variable watcher as + and an unprintable box. Can I assume the real character is in

Re: Unicode in variables

2013-08-19 Thread Devin Asay
On Aug 19, 2013, at 2:22 PM, J. Landman Gay wrote: On 8/19/13 3:07 PM, Devin Asay wrote: Something like this should work: User clicks term to look up. get the text of the click line -- this will be displayed as UTF16 Sorry, that should have been get the unicodeText of the clickLine

Re: Unicode in variables

2013-08-19 Thread Richard Gaskin
Jacque wrote: Basically, I'm storing a glossary. The keys are the glossary terms, some of which are unicode. The definitions are the elements. The user points to a word in a field and I need to retrieve the definition by matching the displayed field text (which is unicodetext) with the glossary

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 3:41 PM, Richard Gaskin wrote: Jacque wrote: Basically, I'm storing a glossary. The keys are the glossary terms, some of which are unicode. The definitions are the elements. The user points to a word in a field and I need to retrieve the definition by matching the displayed field

Re: Unicode in variables

2013-08-19 Thread Monte Goulding
On 20/08/2013, at 6:41 AM, Richard Gaskin ambassa...@fourthworld.com wrote: In my experience that's more strict than it needs to be, but if the format of encoded arrays is any clue there may still be a restriction on having NULL bytes in a key name. Which would count out utf16. Jacque why

Re: Unicode in variables

2013-08-19 Thread Peter Haworth
On Mon, Aug 19, 2013 at 1:22 PM, J. Landman Gay jac...@hyperactivesw.comwrote: Thanks, I'll try it. The glossary is only one piece of a much bigger data set involving a lot of different types of lookups, and this is going to be a huge pain. I'm going to have to rewrite a large part of the

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 4:24 PM, Monte Goulding wrote: Jacque why not uniDecode(theKey,UTF8) in order to use it as a key in the array? It drops or alters characters. The glossary is used a few different ways, and sometimes I need to display all the keys in a field, to act as an index of terms. So it

Re: Unicode in variables

2013-08-19 Thread Monte Goulding
On 20/08/2013, at 8:50 AM, J. Landman Gay wrote: It drops or alters characters. I'm not sure what you mean. UTF8 can represent all the unicode code points. The glossary is used a few different ways, and sometimes I need to display all the keys in a field, to act as an index of terms. So it

Re: Unicode in variables

2013-08-19 Thread J. Landman Gay
On 8/19/13 5:27 PM, Peter Haworth wrote: If you have to rewrite you code base and there's lots of different lookups involved, maybe an SQLite database would work? The lookups search custom properties and script variables mostly. But I'm not sure that moving the problem to a database would