Ronan Reilly wrote:
> That clarifies things.  However, I'm finding that the text strings I read
> from the file are not getting converted by applying utf8.  It seems that
> they have to be datatype unicode for this to work, but they are read in and
> stored as literals.  Is there any way of coercing datatypes in J to get
> around this?   

Your data is probably the old 8-bit ansi (aka ISO-8859-1 or Latin1). To
convert this to utf8, use the verb fix below.

For example:

NB. a umlaut is 228 { a. in ISO-8859-1

   a=. 65 107 116 117 97 108 105 116 228 116 { a.

NB. a umlaut is 196 164 { a. in utf8

   a. i. fix a
65 107 116 117 97 108 105 116 195 164 116

fix=: 3 : 0
val=. a. i. y
msk=. 127 < val
uni=. 192 128 +"1 [ 0 64 #: msk # val
val=. val #~ 1 j. msk
ndx=. I. 127 < val
dat=. a. {~ uni (ndx +/ 0 1) } val
)

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to