Problem solved!
Many thanks for the help.
Ronan
On 12/10/2006 13:30, "Chris Burke" <[EMAIL PROTECTED]> wrote:
> Ronan Reilly wrote:
>> That clarifies things. However, I'm finding that the text strings I read
>> from the file are not getting converted by applying utf8. It seems that
>> they have to be datatype unicode for this to work, but they are read in and
>> stored as literals. Is there any way of coercing datatypes in J to get
>> around this?
>
> Your data is probably the old 8-bit ansi (aka ISO-8859-1 or Latin1). To
> convert this to utf8, use the verb fix below.
>
> For example:
>
> NB. a umlaut is 228 { a. in ISO-8859-1
>
> a=. 65 107 116 117 97 108 105 116 228 116 { a.
>
> NB. a umlaut is 196 164 { a. in utf8
>
> a. i. fix a
> 65 107 116 117 97 108 105 116 195 164 116
>
> fix=: 3 : 0
> val=. a. i. y
> msk=. 127 < val
> uni=. 192 128 +"1 [ 0 64 #: msk # val
> val=. val #~ 1 j. msk
> ndx=. I. 127 < val
> dat=. a. {~ uni (ndx +/ 0 1) } val
> )
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
--
Professor Ronan Reilly
Head of Department
Department of Computer Science
NUI Maynooth
Maynooth
Co. Kildare
IRELAND
t: +353-1-7083847
e: [EMAIL PROTECTED]
w: http://www.cs.nuim.ie; http://cortex.cs.nuim.ie
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm