Ronan Reilly wrote:
> That clarifies things. However, I'm finding that the text strings I read
> from the file are not getting converted by applying utf8. It seems that
> they have to be datatype unicode for this to work, but they are read in and
> stored as literals. Is there any way of coercing datatypes in J to get
> around this?
Your data is probably the old 8-bit ansi (aka ISO-8859-1 or Latin1). To
convert this to utf8, use the verb fix below.
For example:
NB. a umlaut is 228 { a. in ISO-8859-1
a=. 65 107 116 117 97 108 105 116 228 116 { a.
NB. a umlaut is 196 164 { a. in utf8
a. i. fix a
65 107 116 117 97 108 105 116 195 164 116
fix=: 3 : 0
val=. a. i. y
msk=. 127 < val
uni=. 192 128 +"1 [ 0 64 #: msk # val
val=. val #~ 1 j. msk
ndx=. I. 127 < val
dat=. a. {~ uni (ndx +/ 0 1) } val
)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm